cover image
Lambda

Lambda

lambda.ai

2 Jobs

623 Employees

About the Company

The Superintelligence Cloud | Gigawatt-scale AI Factories for Training & Inference

Listed Jobs

Company background Company brand
Company Name
Lambda
Job Title
Engineering Manager - Software Defined Networking
Job Description
**Job Title:** Engineering Manager - Software Defined Networking **Role Summary:** Lead and expand a high-performance, multi-tenant software-defined networking (SDN) team to support AI/ML cloud infrastructure, ensuring customer-driven reliability, scalability, and innovation. Collaborate cross-functionally to deliver mission-critical networking solutions while fostering a culture of technical excellence and team growth. **Expectations:** - Minimum 6+ years in full-time engineering management roles leading networking or SDN teams. - Minimum 10+ years in software engineering, with deep expertise in distributed systems, networking protocols, and HPC/AI datacenter environments. **Key Responsibilities:** - Lead internal and customer-facing SDN projects critical to multitenant infrastructure. - Collaborate with networking, control plane, and HPC architecture teams to shape technical roadmaps. - Ensure customer SLAs for performance, reliability, and feature delivery in production-grade networks. - Manage operational and development workloads, driving automation, observability, and platform improvements. - Oversee hiring, mentorship, and team growth for systems reliability and software engineering roles. - Align product strategies with cutting-edge GPU/ML/AI hosting requirements for scalable cloud infrastructure. **Required Skills:** - Proficiency in Linux networking (namespaces, iptables, eBPF, DPDK), routing protocols (BGP, OSPF), and distributed systems. - Experience with SDN platforms (OpenSwitch, Open vSwitch, VMware NSX) and Kubernetes container networking (CNI plugins, service meshes). - Operational excellence: SLI/SLO management, incident resolution, root cause analysis, and observability tooling (Prometheus, Grafana). - Leadership in high-pressure production environments with 99.99%+ availability targets. - Strong customer-facing skills for technical support, pre-sales, and incident management. **Required Education & Certifications:** - Bachelor’s degree in Computer Science or related field. - (Certifications not explicitly required in job description.)
San francisco, United states
On site
Senior
15-10-2025
Company background Company brand
Company Name
Lambda
Job Title
Senior Site Reliability Engineer - Managed Kubernetes
Job Description
**Job Title:** Senior Site Reliability Engineer – Managed Kubernetes **Role Summary:** Lead the operation, scaling, and automation of large‑scale bare‑metal Kubernetes clusters, ensuring reliability, performance, and rapid incident response for AI/ML workloads. **Expactations:** - 6+ years in SRE or ops roles with deep Linux cluster expertise. - Advanced proficiency in Go and Python. - Proven experience with GitOps, Helm, and custom Kubernetes operators. - Strong communication skills for incident support and customer interaction. - Ability to work independently or as part of a cross‑functional team. **Key Responsibilities:** - Operate and maintain clusters scaling to thousands of nodes. - Handle cluster degradation, recovery, resizing, and incident response. - Participate in on‑call rotation for critical incidents. - Provide customer support for Kubernetes integration, storage, and authentication. - Collaborate with HPC and Datacenter Ops on cross‑functional issues. - Build and maintain control plane services, operators, and custom controllers. - Automate cluster lifecycle (provisioning, upgrades, patching, deletion). - Define and enforce SLOs/SLIs for Kubernetes services and workloads. **Required Skills:** - Linux system administration and cluster management. - Go and Python programming; experience with GitOps (ArgoCD), Helm, and Kubernetes operators. - Production Kubernetes operation (on‑prem, EKS, GKE, etc.). - Familiarity with observability tools (Prometheus, Grafana, FluentBit) and CI/CD pipelines. - Experience provisioning Kubernetes with kubeadm, Cluster API, or similar tooling. - Understanding of CRDs, CSI, CNI; exposure to HPC/GPU workloads is a plus. **Required Education & Certifications:** - Bachelor’s degree in Computer Science or related field, or equivalent professional experience. - Certifications: None specified.
Seattle, United states
On site
Senior
17-11-2025