- Company Name
- Happening
- Job Title
- Senior Site Reliability Engineer (Kafka SME)
- Job Description
-
Job Title: Senior Site Reliability Engineer (Kafka SME)
Role Summary:
Lead the design, deployment, and maintenance of scalable Kafka-based microservices infrastructure that powers a high‑volume betting platform. Drive cross‑team collaboration to standardize infrastructure, monitoring, and CI/CD pipelines, while ensuring reliability, performance, and cost efficiency across distributed systems.
Expactations:
- Deliver high‑quality, production‑ready changes with minimal downtime.
- Maintain an uptime SLA (e.g., 99.9% availability).
- Continuously improve system observability, scaling, and automation.
- Mentor junior engineers and influence best practices in SRE discipline.
Key Responsibilities:
- Design, configure, and tune Kafka clusters and related components (schemas, topics, consumer groups).
- Build and enhance monitoring, alerting, and tracing stacks (e.g., Prometheus, Grafana, Jaeger).
- Optimize logging pipelines and lifecycle management.
- Upgrade and refine CI/CD pipelines; integrate new tools and automate deployments.
- Develop and maintain shared libraries or SDKs for product teams.
- Prototype and evaluate emerging technologies/architectures for scalable messaging.
- Collaborate with system ops and product engineering to align infrastructure with application needs.
- Perform root‑cause analysis and capacity planning for distributed services.
Required Skills:
- Strong software engineering background (Java preferred).
- Deep knowledge of Kafka and microservices architecture.
- Hands‑on experience with distributed system monitoring and observability.
- Familiarity with relational and NoSQL databases (PostgreSQL, Redis, MongoDB, CockroachDB).
- Version control (Git), CI/CD (Jenkins, CircleCI, GitLab, GitHub).
- Experience in cloud environments (AWS, Azure, GCP).
- Knowledge of DevOps practices, automation, and scripting (Bash, Python, or similar).
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Engineering, or related field.
- Relevant certifications (e.g., AWS Certified DevOps Engineer, Kafka Certified Engineer) are a plus.