- Company Name
- Russell Tobin
- Job Title
- Site Reliability Engineer (Database Services)
- Job Description
-
**Job Title:** Site Reliability Engineer (Database Services)
**Role Summary:**
Responsible for ensuring the availability, performance, and stability of enterprise database platforms. Applies SRE principles to automate operations, monitor health, and prevent incidents across Oracle and Exadata environments while collaborating with cross‑functional teams.
**Expectations:**
- Deliver reliable, self‑healing database services.
- Automate repetitive tasks and improve observability.
- Participate in incident response, root‑cause analysis, and continuous improvement.
- Maintain compliance with defined SLIs/SLOs and reliability metrics.
**Key Responsibilities:**
- Operate and support Oracle 11g/12c/19c and Exadata clusters.
- Manage RMAN backups, Data Guard replication, RAC, ASM, and performance tuning.
- Develop and maintain automation scripts (Python, Bash, PowerShell).
- Implement monitoring, alerting, and dashboarding for database health.
- Collaborate with development, infrastructure, and security teams to embed SRE practices.
- Contribute to incident management, post‑mortems, and preventive measures.
- Provide knowledge transfer and documentation for database reliability processes.
**Required Skills:**
- Proven SRE, Database Reliability Engineer, DBA, or similar experience.
- Deep expertise with Oracle database administration and Exadata.
- Strong knowledge of RMAN, Data Guard, RAC, ASM, and performance optimization.
- Proficiency in scripting/automation (Python, Bash, PowerShell).
- Experience with incident management and reliability metrics (SLI/SLO).
- Familiarity with SQL Server, PostgreSQL, AWS RDS, or DynamoDB is a plus.
**Required Education & Certifications:**
- Bachelor’s degree in Information Technology, Computer Science, or related field (preferred).
- Relevant Oracle certifications (e.g., OCP, OCA) advantageous but not mandatory.