- Company Name
- Iopa Solutions
- Job Title
- Senior Site Reliability Manager – SaaS Fintech
- Job Description
-
Job Title: Senior Site Reliability Manager – SaaS Fintech
Role Summary:
Lead the reliability, performance, and scalability strategy for a rapidly growing cloud‑native fintech platform. Own SRE initiatives, drive SLIs/SLOs, and mentor engineering teams while collaborating with product, security, and engineering leadership to build resilient, automated services.
Expactations:
- 10+ years of SRE, DevOps, or production engineering experience in SaaS or fintech contexts.
- Demonstrated ability to design and implement reliability, observability, and automation solutions in high‑availability environments.
- Proven influence on architecture, road‑mapping, and operational policies.
- Capability to lead incident response, root‑cause analysis, and post‑mortem processes.
- Comfortable operating at full stack: cloud (AWS or equivalent), container orchestration, CI/CD pipelines, IaC, and distributed systems.
Key Responsibilities:
- Define and execute reliability strategy across SRE, observability, incident management, performance tuning, and automation.
- Collaborate with Engineering, Security, and Product to establish SLIs/SLOs, error budgets, and infrastructure resiliency goals.
- Mentor SREs and engineers, fostering a culture of operational excellence and reducing toil.
- Deliver and maintain tooling, pipelines, and runbooks that support CI/CD, IaC, and scalable service management.
- Lead post‑incident analysis, root‑cause investigations, and continuous improvement initiatives.
- Champion cross‑functional communication and feedback loops to accelerate release velocity and system reliability.
Required Skills:
- Expertise in AWS (or comparable cloud provider), Linux, networking, containers (Docker, Kubernetes), and distributed systems.
- Strong knowledge of observability tools (Prometheus, Grafana, ELK, etc.) and automation frameworks (Ansible, Terraform, Pulumi).
- Familiarity with SRE principles: SLOs, error budgets, runbooks, incident workflows.
- Leadership and mentoring abilities, with experience managing or scaling SRE teams.
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
- Relevant certifications (e.g., AWS Certified Solutions Architect, Kubernetes Administrator, Cloud DevOps Engineer) preferred.