- Company Name
- Netflix
- Job Title
- Distributed Systems Engineer (L5) - Data Platform
- Job Description
-
**Job Title:**
Distributed Systems Engineer (L5) – Data Platform
**Role Summary:**
Design, build, and maintain scalable, high‑availability distributed data infrastructure for internal platforms. Work across real‑time data movement, batch processing, key‑value stores, caching layers, and data governance tools to enable secure, efficient data access for business teams.
**Expectations:**
- Deliver robust, performance‑optimized services that run at Netflix‑scale.
- Collaborate with cross‑functional teams to define requirements, review designs, and iterate.
- Drive automation and observability for operations, reliability, and security.
- Participate in open‑source projects (e.g., Cassandra contributions) and maintain internal best practices.
**Key Responsibilities:**
1. **Real‑time Data Platforms** – develop and enhance Kafka, Flink, and Mantis control planes; manage compute resources and schema registry.
2. **Abstraction Layer Services** – build self‑service APIs that simplify use of Kafka, Flink, Spark, and Cassandra for diverse user personas.
3. **Data Discovery & Governance** – maintain catalog, distributed search, lineage, and policy engine frameworks for secure, compliant data usage.
4. **Online Data Stores** – design, implement, and operate high‑performance key‑value stores and caching solutions, ensuring reliability and cost efficiency.
5. **Operations & Automation** – create CI/CD pipelines, monitoring dashboards, alerting, and self‑healing mechanisms.
6. **Performance & Reliability** – conduct capacity planning, load testing, tuning, and incident analysis to meet SLA targets.
7. **Knowledge Sharing** – author technical documentation, mentor peers, and contribute to internal workshops.
**Required Skills:**
- Strong expertise in distributed systems, data streaming, and storage technologies.
- Hands‑on experience with Kafka, Flink, Spark, Cassandra, or comparable platforms.
- Proficiency in Java/Scala (or Go); familiarity with JVM internals.
- Experience with container orchestration (Kubernetes), CI/CD pipelines, and cloud infrastructure (AWS GCP).
- Solid fundamentals in networking, concurrency, fault‑tolerance, and performance engineering.
- Ability to design REST/GraphQL, gRPC, or similar interfaces for internal services.
- Excellent problem‑solving, communication, and teamwork skills.
**Required Education & Certifications:**
- Bachelor’s degree (or higher) in Computer Science, Engineering, or a related field, **or** equivalent professional experience.
- Optional certifications: AWS Certified Solutions Architect, Google Professional Cloud Architect, or equivalent distributed systems credentials.