cover image
Andiamo

Lead Engineer/SRE, KMS - AdTech Leader

On site

Boston, United states

Senior

Full Time

11-12-2025

Share this job:

Skills

Python Ruby Go Bash MySQL DevOps Kubernetes Networking Linux Organization AWS Django Redis Terraform PostGres

Job Specifications

Lead Site Reliability Engineer
This position offers the opportunity to guide the reliability and performance of large scale, customer facing systems. You will help create the services, automation, and architectural patterns that allow engineering teams to move quickly with confidence. The work focuses on treating operations as a software problem, building systems that are resilient by design, and partnering with product teams to ensure they can deliver reliable features at speed. About the Role
You will take ownership of key reliability initiatives, shaping the technical vision for the systems under your care. Your work will support the continuous evolution of Back End services and development workflows, helping teams release and operate their software smoothly. This role is ideal for someone who enjoys complex distributed systems, performance engineering, and building tools that empower large engineering groups. How You Will Make a Difference
Deliver foundational services that support rapid and predictable software delivery across the engineering organization.
Create systems and operational processes that support reliable and scalable applications.
Identify upstream solutions that prevent recurring issues and promote long term stability.
Develop the technical roadmap for your area, collaborating with stakeholders to solve meaningful engineering challenges.
Improve throughput and system performance by analyzing and eliminating architectural bottlenecks.
Work with tools and technologies such as Python, AWS, Django, Kubernetes, Bash, Terraform, MySQL, Redis, and Postgres.
Help foster a culture of strong engineering practices through thoughtful design discussions and collaborative whiteboarding sessions.
Support and mentor engineers across the company, helping raise the standard of engineering quality and operational excellence.
Write and maintain software that improves the reliability, performance, and efficiency of platform services.
Participate in on call rotations with a focus on resolving issues at the source and reducing alert fatigue.
Introduce architectural changes that significantly improve the scalability and resilience of critical systems.
Work closely with product oriented engineers and other SREs to deliver improvements that have real customer impact.
Use data driven analysis to understand system behavior, predict scaling needs, and guide strategic improvements.
Promote site reliability principles across the engineering organization. Who You Are
Ten or more years of experience in site reliability engineering, devops, or related fields.
Degree in computer science or a related field, or equivalent hands on experience.
Calm and focused during outages with the ability to drive investigations to clear root cause and long term corrective measures.
Strong understanding of Linux systems and the full networking stack.
Experience collaborating with engineering teams to build and operate production software.
Proficiency writing code using best practices in languages such as Python, Ruby, or Go.
Genuine interest in exploring emerging AI tools and responsibly experimenting with techniques that improve engineering workflows.
This role is well suited for someone who enjoys solving reliability challenges at scale, improving platform performance, and building systems that help engineers ship better software with greater confidence.

About the Company

We're a different kind of recruiting firm. We employ Research & Data Analysts alongside Recruiters to bring our clients the top 2% of passive technology and go-to-market talent. With unique data partnerships and a revolutionary technology platform, we're mining and curating massive amounts of data to bridge the talent gap. Our data-driven approach has helped us become the top recruiting partners of Amazon.com, HBO, Bloomberg, Goldman Sachs, TripAdvisor, Audible, MasterCard, and others. Andiamo has locations in New York, ... Know more