- Company Name
- LexisNexis Risk Solutions
- Job Title
- Site Reliability Engineering Manager
- Job Description
-
**Job title:** Site Reliability Engineering Manager
**Role Summary:**
Lead and scale multiple Site Reliability Engineering (SRE) teams across product groups, driving the adoption of SRE principles, cloud reliability, automation, observability, and performance excellence. Serve as both technical and strategic leader, shaping incident response, post‑mortem culture, and reliability improvement initiatives while collaborating with engineering and business stakeholders to align SRE outcomes with organizational goals.
**Expactations:**
- Deliver high‑performance SRE teams that consistently meet or exceed reliability targets.
- Act as a mentor and champion for SRE practices across the organization.
- Shape and implement strategy, departmental plans, and PoC programs for emerging technologies.
- Maintain ownership of cloud‑native security, identity, and secrets management frameworks.
**Key Responsibilities:**
- Manage and coach multi‑county SRE teams, focusing on coaching, hiring, and performance reviews.
- Promote and enforce SRE best practices, including incident response, blameless post‑mortems, and reliability metrics.
- Architect, implement, and operate IaC (Terraform, Ansible) and Kubernetes (AKS/EKS/GKE) environments.
- Design secure‑by‑default cloud‑native systems and continuously improve security posture.
- Define and enforce SLA/SLI/SLO standards for production services.
- Build and maintain automated provisioning, deployment, scaling, and monitoring frameworks.
- Lead complex production troubleshooting across application, infrastructure, and network layers.
- Conduct PoC evaluations to introduce new technologies and improve processes.
- Integrate policy, compliance checks, and security controls into CI/CD pipelines.
**Required Skills:**
- Proven experience managing SRE teams (not individual contributor).
- Proficiency with a major public cloud (AWS or Azure).
- Expert in Terraform, Ansible, Kubernetes (multi‑cloud), and container deployment strategies (Blue‑Green, Canary).
- Deep knowledge of Linux and Windows server environments.
- Experience building CI/CD automation and observability stacks (Grafana, Prometheus, Loki, Splunk, ELK, OpenTelemetry).
- Strong understanding of SLA/SLI/SLO implementation, monitoring, alerting, and logging best practices.
- Cloud‑native security expertise (access control, secrets management, HashiCorp Vault).
- Scripting and programming skills (Python, Bash, Golang, PowerShell, C#).
- Knowledge of networking, performance tuning, and troubleshooting.
- Familiarity with Git, GitLab, GitHub, Jenkins, and related CI/CD tools.
**Required Education & Certifications:**
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent work experience).
- Relevant certifications (e.g., AWS Certified Solutions Architect, Microsoft Certified: Azure Solutions Architect, Certified Kubernetes Administrator, Terraform Associate, Certified DevOps Practitioner).
Alpharetta, United states
On site
03-11-2025