- Company Name
- Roku
- Job Title
- Senior Software Engineer, DevOps
- Job Description
-
Job title: Senior Software Engineer, DevOps
Role Summary: Lead the design, implementation, and maintenance of cloud‑native infrastructure across AWS, Kubernetes, and service mesh (Istio) to support a large global engineering team. Drive automation, observability, and platform reliability while collaborating closely with cross‑functional stakeholders, managing on‑call responsibilities, and scaling the deployment ecosystem.
Expectations:
- Provide proactive, daily support to internal customers, optimizing workload performance and reliability.
- Own and evolve platform capabilities, ensuring high availability, scalability, and compliance with SLOs/SLA targets.
- Work in a highly distributed, multi‑time‑zone environment, demonstrating self‑motivation and clear, constructive communication.
- Participate in on‑call rotations and incident response, maintaining clear documentation and post‑mortem reporting.
Key Responsibilities:
- Design, deploy, and operate Kubernetes clusters and AWS ECS fleets globally, including networking, storage, and compute resource management.
- Implement and evolve service mesh (Istio/Envoy) to enable secure, observable traffic routing and load balancing across services.
- Build and maintain CI/CD pipelines, Terraform/Helm scripts, and automation tools to accelerate feature delivery and reduce manual toil.
- Integrate and manage observability stacks (Datadog, Prometheus, Grafana, ELK, Jaeger, Loki, Kiali) for metrics, tracing, and log aggregation.
- Identify, analyze, and resolve infrastructure bottlenecks, feature gaps, and scalability issues through data‑driven diagnostics.
- Mentor and coach engineers on best practices for cloud architecture, security, and reliability; act as a trusted advisor to product and engineering teams.
- Evaluate and recommend new tooling or architectural shifts that enhance platform reliability, developer experience, or cost efficiency.
Required Skills:
- 5+ years in infrastructure engineering, DevOps, or large‑scale software engineering with extensive cross‑team engagement.
- Expert knowledge of AWS (ECS, EKS, Lambda) and/or GCP services; strong understanding of cluster management and autoscaling.
- Proficiency with Kubernetes, container orchestration, and service mesh technologies (Istio, Envoy).
- Deep experience with open‑source observability platforms (Datadog, Prometheus, Grafana, ELK stack, Jaeger, Loki, Kiali).
- Advanced scripting in Python, Shell, Terraform, or Helm for infrastructure-as-code and automation.
- Strong ownership of reliability engineering principles (SLOs, SLAs, error budgets, resiliency).
- Excellent communication and collaboration skills in a distributed, cross‑function environment.
Required Education & Certifications:
- B.S. or M.S. in Computer Science, Engineering, or related field, or equivalent professional experience.
- Relevant certifications highly valued: Kubernetes Administrator (CKA), AWS Certified DevOps Engineer, CNCF Certified Kubernetes Security Specialist (CKS), or similar.
Cambridge, United kingdom
Hybrid
Senior
26-12-2025