- Company Name
- BNY
- Job Title
- Senior Vice President, Site Reliability Engineer
- Job Description
-
**Job title**
Senior Vice President, Site Reliability Engineer
**Role summary**
Lead the Site Reliability Engineering function, defining reliability standards, automating infrastructure, and ensuring scalable, highly available services for a global cloud environment. Work closely with product, infrastructure, and DevOps teams to reduce incidents, embed a reliability-first culture, and drive continuous improvement in operations and tooling.
**Expectations**
* Deliver measurable reliability outcomes through SLOs, SLIs, and error budgets.
* Own incident management, post‑mortem analysis, and proactive system hardening.
* Mentor and grow SRE teams, fostering collaboration across engineering disciplines.
**Key responsibilities**
* Define and enforce SLOs/SLIs, execute reliability reviews, and track error budgets.
* Automate infrastructure provisioning and deployment pipelines using Terraform, Helm, and Kubernetes.
* Build and maintain observability platforms with Prometheus, Grafana, AppDynamics, Splunk, and Datadog.
* Own incident response lifecycle: on‑call participation, root‑cause analysis, automated recovery, and post‑mortems.
* Design and implement highly available, fault‑tolerant cloud architectures on Azure, AWS, or GCP.
* Develop platform tooling for container orchestration, third‑party integrations, and cloud‑native operations.
* Represent SRE best practices in cross‑functional initiatives, promoting reliability as a core product quality metric.
**Required skills**
* Cloud platforms: Azure, AWS, or GCP – expertise in design, deployment, and operational monitoring.
* Containerization and orchestration: Docker, Kubernetes, Helm.
* Infrastructure as Code: Terraform, Kubernetes manifests, and CD pipelines.
* Observability tools: Prometheus, Grafana, AppDynamics, Splunk, Datadog.
* Programming/scripting: Python, Go, or Java with focus on automation and tooling.
* SRE fundamentals: SLO/SLI/SLAs, error budgeting, incident lifecycle, post‑mortems, reliability‑driven architecture.
* Strong communication, leadership, and collaboration skills in Agile environments.
**Required education & certifications**
* Bachelor’s degree (or higher) in Computer Science, Engineering, or a related field.
* Preferred certifications:
* AWS Certified Solutions Architect – Professional or Architect – Associate
* Microsoft Certified: Azure Solutions Architect Expert
* Google Cloud Professional Cloud Architect
* Certified Kubernetes Administrator (CKA) or Kubernetes Application Developer (CKAD)
* Any relevant SRE or DevOps certifications (e.g., Google SRE, SysOps Administrator).
Manchester, United kingdom
On site
Senior
24-11-2025