Lambda

lambda.ai

2 Jobs

623 Employees

About the Company

The Superintelligence Cloud | Gigawatt-scale AI Factories for Training & Inference

Listed Jobs

Company Name: Lambda
Job Title: Senior Site Reliability Engineer - Managed Kubernetes
Job Description: **Job Title:** Senior Site Reliability Engineer – Managed Kubernetes **Role Summary:** Lead the operation, scaling, and automation of large‑scale bare‑metal Kubernetes clusters, ensuring reliability, performance, and rapid incident response for AI/ML workloads. **Expactations:** - 6+ years in SRE or ops roles with deep Linux cluster expertise. - Advanced proficiency in Go and Python. - Proven experience with GitOps, Helm, and custom Kubernetes operators. - Strong communication skills for incident support and customer interaction. - Ability to work independently or as part of a cross‑functional team. **Key Responsibilities:** - Operate and maintain clusters scaling to thousands of nodes. - Handle cluster degradation, recovery, resizing, and incident response. - Participate in on‑call rotation for critical incidents. - Provide customer support for Kubernetes integration, storage, and authentication. - Collaborate with HPC and Datacenter Ops on cross‑functional issues. - Build and maintain control plane services, operators, and custom controllers. - Automate cluster lifecycle (provisioning, upgrades, patching, deletion). - Define and enforce SLOs/SLIs for Kubernetes services and workloads. **Required Skills:** - Linux system administration and cluster management. - Go and Python programming; experience with GitOps (ArgoCD), Helm, and Kubernetes operators. - Production Kubernetes operation (on‑prem, EKS, GKE, etc.). - Familiarity with observability tools (Prometheus, Grafana, FluentBit) and CI/CD pipelines. - Experience provisioning Kubernetes with kubeadm, Cluster API, or similar tooling. - Understanding of CRDs, CSI, CNI; exposure to HPC/GPU workloads is a plus. **Required Education & Certifications:** - Bachelor’s degree in Computer Science or related field, or equivalent professional experience. - Certifications: None specified.

Seattle, United states

On site

Senior

17-11-2025

Company Name: Lambda
Job Title: Senior Software Engineer - SI Partnership
Job Description: **Job Title**: Senior Software Engineer - SI Partnership **Role Summary**: Architect and deliver end-to-end cloud computing solutions to integrate superintelligence workloads into Lambda’s AI cloud platform, enhancing API and UI offerings for enterprise customers. **Expectations**: Candidate must own complex features from definition to deployment, collaborate across engineering and product teams, and prioritize high-quality outcomes under tight deadlines in ambiguous environments. Proven ability to communicate technical trade-offs and scope timelines with stakeholders. **Key Responsibilities**: - Design and implement customer-facing APIs, UIs, and developer tooling for Lambda Cloud. - Develop full-stack (front-end and back-end) solutions across core product features and infrastructure. - Partner with product engineering, platform, and infrastructure teams to deliver end-to-end solutions. - Collaborate with support and operations teams to resolve technical issues and optimize customer experiences. - Engage directly with superintelligence customer engineering teams to integrate workloads on Lambda’s platform. - Own production features, design observability processes, and participate in incident response. **Required Skills**: - Proficiency in modern web frameworks: TypeScript, React/Vue/Svelte, HTML/CSS, Vite. - Backend development experience: Python, Django/FastAPI, PostgreSQL, Unix/CLI. - API design and documentation for developer-facing tools. - CI/CD pipelines, reliability engineering (logging, alerting), and cloud-native services (AWS, Okta, Cloudflare). **Required Education & Certifications**: Not specified.

Seattle, United states

On site

Senior

31-12-2025