cover image
LexisNexis Risk Solutions

Site Reliability Engineering Manager

On site

Alpharetta, United states

Full Time

03-11-2025

Share this job:

Skills

Leadership Python C# Bash PowerShell Incident Response Splunk GitHub GitLab CI/CD DevOps Kubernetes Monitoring Version Control Jenkins Ansible Networking Linux Windows Programming git Organization Azure AWS CI/CD Pipelines Terraform Prometheus Grafana Infrastructure as Code Loki Windows Server

Job Specifications

Are you an experienced Site Reliability Engineering leader ready to shape strategy, inspire teams, and drive innovation at scale?

Are you looking to lead a high-impact SRE team where your leadership will directly influence innovation, reliability, and engineering excellence across the organization?

About the role, this is an advanced management level role. Individuals are required to manage multiple SRE teams within a single product group. You will ensure teams are working in alignment with the SRE framework, including leading sustainable incident response, blameless post-mortems, and production reliability improvement projects. You will mentor other team members on SRE practices and cultivate innovation and collaboration across multiple teams. Manages delivery of and may provide input to strategy and departmental plans.

About the team, this role is part of the Business Systems SRE team within LexisNexis Risk Solutions Group. As a SRE Manager, you will act as a technical and strategic leader, partnering with engineering and business stakeholders to drive cloud reliability, automation, observability, and performance initiatives across critical platforms. This role combines technical depth with managerial acumen, including leading Proof-of-Concept (PoC) initiatives, guiding teams, and aligning SRE outcomes with leadership expectations and business goals.

Responsibilities:

Managing high performance SRE teams ideally in multiple counties. We are not looking for an individual contributor.
Promoting and implementing Site Reliability Engineering best practices and principles across product and platform teams
Architecting, implementing, and managing infrastructure using Infrastructure as Code (IaC) and DevOps principles
Designing and maintaining secure-by-default cloud-native systems with a focus on continuous improvement of security posture
Defining and enforcing SLA/SLI/SLO standards for production systems
Developing and maintaining automated frameworks for provisioning, deployment, scaling, and monitoring
Conducting in-depth troubleshooting of complex production issues across application, infrastructure, and network layers
Leading proof-of-concept efforts to evaluate and introduce new technologies
Implement policy and compliance checks within CI/CD pipelines

Essential Skills & Experience:

Current and extensive experience managing teams of SRE’s. We are not looking to hire an individual contributor in this role.
Proficiency with at least one major public cloud provider: Azure, AWS
Extensive experience with Terraform, Ansible, and other IaC/orchestration tools
Expertise in Kubernetes (AKS/EKS/GKE), containerized workloads, and deployment strategies (e.g., Blue Green)
Deep knowledge of Linux and Windows server environments
Proven experience in building and enforcing automation frameworks for CI/CD and infrastructure provisioning
Hands-on experience with observability platforms such as Grafana, Kibana, Splunk, ELK Stack (Elasticsearch, Logstash, Kibana), OpenTelemetry, Prometheus, Loki
Strong knowledge of SLAs, SLIs, and SLOs and their application in production environments
Experience with monitoring, alerting, and logging best practices
Solid understanding of cloud-native security, identity management, and secrets management (e.g., HashiCorp Vault)
Skilled in scripting and programming (e.g., Python, Bash, Golang, PowerShell, C#)
Strong knowledge of networking, application performance tuning, and troubleshooting
Familiarity with common CI/CD and version control tools (e.g., Git, GitLab, GitHub, Jenkins)

About the Company

At LexisNexis Risk Solutions®, we believe in using data for good to solve problems and make a positive impact on people, industry and society. We deliver enhanced value to our customers by leveraging the power of insight through data, advanced analytics and innovative technologies to help them solve problems, make better decisions and improve operations. Our technologies, decision tools and services give our customers a clear advantage in evaluating and predicting risk, enhancing operational efficiency and protecting their c... Know more