cover image
SambaNova Systems

SambaNova Systems

www.sambanova.ai

491 Employees

About the Company

AI is changing the world and at SambaNova, we believe that you don’t need unlimited resources to take advantage of the most advanced, valuable AI capabilities - capabilities that are helping organizations explore the universe, find cures for cancer, and giving companies access to insights that provide a competitive edge.

We deliver the world’s fastest and only complete AI solution for enterprises and governments with world-record inference performance and accuracy. Powered by the SambaNova SN40L Reconfigurable Dataflow Unit (RDU), organizations can build a technology backbone for the next decade of AI innovation with SambaNova Suite. Our fully integrated hardware-software system, DataScale®, enables organizations to train, fine-tune, and deploy the most demanding AI workloads using the largest and most challenging models. Most recently, with the launch of our newest offering, SambaNova Cloud, developers can supercharge AI-powered applications on Llama 3.2 models.

SambaNova was founded in 2017 in Palo Alto, California, by a group of industry luminaries, business leaders, and world-class innovators who understand AI. Today, we’ve built an incredibly smart and motivated team dedicated to making a lasting impact on the industry and equipping our customers to thrive in the new era of AI.

Listed Jobs

Company background Company brand
Company Name
SambaNova Systems
Job Title
Cloud Platform Engineer
Job Description
**Job Title:** Cloud Platform Engineer **Role Summary:** Oversees reliability, performance, and scalability of AI inferencing services, ensuring high uptime, low latency, and efficient resource utilization. Bridges software development and operations to resolve operational challenges while supporting global infrastructure expansion. **Expectations:** - Shared ownership of production inferencing services across multiple regions (Asia, Europe, Latin America). - Participation in a balanced on-call rotation (primary/secondary model) for 24/7 service reliability. - Leadership in incident management, blameless post-mortems, and system resilience improvements. **Key Responsibilities:** 1. Monitor and optimize service health using tools like Prometheus, Grafana, and Datadog, ensuring actionable alerts and minimizing false positives. 2. Proactively identify and resolve performance bottlenecks, design auto-scaling policies, and drive capacity planning for infrastructure scalability. 3. Manage cloud infrastructure via Infrastructure as Code (Terraform, Ansible) on AWS, GCP, and/or Azure, maintaining security and repeatability. 4. Build and enhance CI/CD pipelines for seamless model and service updates, automating manual operational tasks. 5. Define and report Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and improve reliability. 6. Collaborate with engineering and finance teams to forecast infrastructure needs and optimize cloud costs. **Required Skills:** - Expertise in cloud platforms (AWS, GCP, Azure) and Infrastructure as Code (Terraform, Ansible). - Proficiency in monitoring/alerting (Prometheus, Grafana, Datadog) and performance optimization. - Strong CI/CD pipeline development and automation capabilities. - Experience with service reliability engineering, incident response, and SLO/SLI management. - Problem-solving skills for complex AI infrastructure challenges. **Required Education & Certifications:** - Bachelor’s degree in computer science, engineering, or a related field.
Palo alto, United states
On site
13-12-2025
Company background Company brand
Company Name
SambaNova Systems
Job Title
Principal DevOps Engineer
Job Description
**Job Title:** Principal DevOps Engineer **Role Summary:** Lead the design, implementation, and maintenance of CI/CD pipelines and release infrastructure for an enterprise‑grade generative AI platform. Ensure stability, scalability, and performance of build and deployment processes while collaborating with cross‑functional engineering teams. **Expectations:** - Own and continuously improve the Bazel ecosystem, remote execution, and artifact management. - Maintain high‑availability CI/CD pipelines (CircleCI, Jenkins, etc.) and optimize workflows. - Manage large‑scale Python and RPM package dependencies. - Drive best practices for infrastructure, containerization, and cloud‑agnostic deployment. - Provide rapid troubleshooting and support for development teams. **Key Responsibilities:** - Administer and troubleshoot Bazel builds, RBE setup, and related tooling. - Configure, monitor, and enhance CircleCI pipelines and workflow definitions. - Oversee Google Artifact Registry (GAR) and/or JFrog Artifact Management usage. - Implement and maintain scripting solutions (Python, Bash) for automation. - Coordinate with engineering to integrate infrastructure changes and improvements. - Support containerization (Docker) and orchestration (Kubernetes) initiatives. - Contribute to documentation and knowledge sharing across teams. **Required Skills:** - 5+ years DevOps/Infrastructure experience. - Expertise with Bazel (C++/Python) ecosystems and remote execution. - Strong proficiency in CI/CD tools (CircleCI, Jenkins, GitLab CI/CD). - Python package management and RPM handling at scale. - Experience with GAR and/or JFrog Artifact Management. - Linux/Unix command‑line expertise; advanced scripting (Python, Bash). - Solid problem‑solving, attention to detail, and collaborative communication. **Preferred Skills:** - Containerization (Docker) and Kubernetes orchestration. - Cloud platforms (AWS, Google Cloud). - Additional CI/CD tools and automation frameworks. - Understanding of software development best practices and coding standards. **Required Education & Certifications:** - Bachelor’s degree in Computer Science, Engineering, or related technical field (or equivalent practical experience). - Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, Kubernetes Certified Administrator) are a plus but not mandatory.
United states
Remote
Senior
13-12-2025