- Company Name
- Pegasystems
- Job Title
- Senior Site Reliability Engineer
- Job Description
-
Job Title: Senior Site Reliability Engineer
Role Summary:
Seasoned engineer responsible for ensuring reliability, scalability, and performance of a SaaS platform. Works on a lean, globally distributed team to define reliability targets, automate operations, and drive continuous improvement in a cloud‑native environment.
Expectations:
- Deliver and maintain SLOs/SLIs and manage error budgets.
- Collaborate across product, platform, and engineering teams worldwide.
- Lead incident response, root‑cause analysis, and post‑mortem processes.
- Promote automation, observability, and DevOps best practices.
- Operate effectively in a small, high‑autonomy team with minimal supervision.
Key Responsibilities:
- Define, implement, and monitor reliability metrics (SLOs, SLIs, error budgets).
- Design and support fault‑tolerant, scalable architectures.
- Build and maintain CI/CD pipelines and automated deployment workflows.
- Develop and enhance monitoring, alerting, and dashboarding solutions (e.g., Prometheus, Grafana, Datadog).
- Conduct on‑call duties, incident triage, and root‑cause analyses.
- Drive infrastructure as code using Kubernetes and cloud services (AWS, Azure, or GCP).
- Mentor team members and foster cross‑time‑zone collaboration.
Required Skills:
- 5+ years in software engineering or infrastructure roles, including ≥1 year in an SRE capacity.
- Strong experience with cloud platforms (AWS, Azure, or GCP) and container orchestration (Kubernetes).
- Proficiency in monitoring/observability tools (Prometheus, Grafana, Datadog, etc.).
- Deep understanding of distributed systems, reliability engineering, and CI/CD pipelines.
- Excellent communication and stakeholder‑management abilities.
- Ability to work independently and drive results in a remote, culturally diverse environment.
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Engineering, or related field (preferred, not required).
- Relevant cloud certifications (e.g., AWS Certified Solutions Architect, GCP Professional Cloud Architect) or CKAD (Certified Kubernetes Application Developer) considered a plus.