- Company Name
- Jobs via eFinancialCareers
- Job Title
- Lead Cloud Site Reliability Engineer - London Stock Exchange Group
- Job Description
-
**Job Title**
Lead Cloud Site Reliability Engineer
**Role Summary**
Senior Site Reliability Engineer responsible for the availability, performance, and scalability of a critical shared service platform. Drives automation, observability, and cloud‑migration initiatives while partnering with development teams to enhance reliability and release velocity. Acts as a technical leader in incident management, SLO governance, and engineering best practices across a global financial‑services organization.
**Expectations**
- Minimum 12 years of experience in software/systems engineering or DevOps.
- Proven track record of building and operating large‑scale cloud services.
- Ability to lead on‑call rotations, conduct post‑mortems, and drive continuous improvement.
- Strong communication and collaboration skills to influence cross‑functional teams.
- Commitment to a learning culture and staying current with emerging technologies.
**Key Responsibilities**
- Define, monitor, and improve Service Level Objectives (SLOs) and associated metrics.
- Develop and maintain automation for scaling, self‑healing, and deployment of services.
- Partner with product and engineering teams to embed reliability, observability, and security into the development lifecycle.
- Participate in on‑call, incident response, root‑cause analysis, and post‑incident reviews.
- Conduct architectural reviews and operational acceptance testing for cloud migration projects.
- Design and implement Datadog dashboards, metrics, alerts, and integrate with BigPanda for incident management.
- Manage Infrastructure as Code (Terraform) and enforce identity & access management (EntraID or equivalents).
- Advocate and mentor on best engineering practices, performance tuning, and cost‑effective cloud operations.
**Required Skills**
- Proficiency in object‑oriented languages (Java, C#, Python, or Go).
- Deep experience with Unix/Linux and Windows environments.
- Hands‑on expertise with at least one major cloud platform (Azure, AWS, or GCP).
- Strong foundation in algorithms, data structures, and system design.
- DevOps mindset: CI/CD pipelines, IaC (Terraform), configuration management.
- Observability: logging, metrics, tracing, alerting (Datadog, BigPanda).
- Knowledge of identity and access management, application security, and networking concepts.
- Excellent problem‑solving, troubleshooting, and root‑cause analysis abilities.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Software Engineering, or a related technical field (or equivalent practical experience).
- Relevant cloud certifications (e.g., Azure Solutions Architect, AWS Solutions Architect, GCP Professional Cloud Architect) are preferred but not mandatory.