- Company Name
- Notion
- Job Title
- Software Engineer, Enterprise Data Platform
- Job Description
-
Job Title: Software Engineer, Enterprise Data Platform
Role Summary:
Design, build, and operate a secure, high‑performance lakehouse that serves Notion’s AI, analytics, and search services for enterprise customers. Drive end‑to‑end data pipelines, enforce encryption‑by‑design, enable fine‑grained access, and improve observability and cost efficiency across a multi‑region, multi‑cell environment.
Expactations:
- Deliver production‑ready lakehouse components (tables, catalogs, schema management).
- Own end‑to‑end batch and streaming pipelines, ensuring reliability and compliance.
- Integrate enterprise key management and perform file‑ and record‑level encryption.
- Build audit and residency primitives that expose user access and location data.
- Enhance operational excellence through improved alerting, debugging, and on‑call workflows.
- Optimize performance and cost on Spark, Kafka, and storage for large workspaces.
- Enable ML and search workflows by provisioning the underlying data infrastructure.
- Contribute to platform roadmap, design documentation, and vendor evaluations.
Key Responsibilities:
- Design and evolve the data lakehouse using Iceberg/Hudi/Delta and cataloging systems.
- Develop, test, and maintain batch and streaming pipelines with Spark, Kafka, EMR, etc.
- Implement Enterprise Key Management (EKM) workflows, ensuring secure key handling.
- Provide fine‑grained access control, auditing, and data residency features.
- Raise reliability metrics: on‑call support, incident response, alerting, and observability.
- Tune cluster performance and reduce costs across Kafka, Spark, and storage.
- Build infrastructure for ML training/inference, ranking, and embedding pipelines.
- Author design docs, conduct evaluations, and influence platform direction and vendor selection.
Required Skills:
- 5+ years in building/operating large‑scale data platforms for SaaS or similar.
- Proficient in Python, Java, or Scala; strong SQL for analytics and modeling.
- Hands‑on experience with Spark (debugging, performance tuning).
- Experience with Kafka or equivalent; knowledge of CDC/ingestion patterns (Debezium, Fivetran).
- Familiarity with lakehouse formats (Iceberg, Hudi, Delta) and data catalogs/schema evolution.
- Understanding of data security: access control, encryption at rest/transit, auditing.
- Experience with AWS, GCP, or Azure and managed services (EMR, Dataproc, Kubernetes).
- Comfort owning production services: on‑call, incident management, reliability improvements.
- Nice to have: enterprise customer experience, EKM/compliance features, multi‑region architecture, ML workflow provisioning, vector database integration, observability tooling (Honeycomb, OpenTelemetry).
Required Education & Certifications:
- Bachelor’s (or higher) degree in Computer Science, Engineering, or related field.
- Relevant certifications (e.g., AWS Certified Data Analytics – Specialty, GCP Professional Data Engineer) are advantageous.
San francisco, United states
Hybrid
Mid level
27-11-2025