- Company Name
- Devfi
- Job Title
- AI/ML Data Pipeline Engineer
- Job Description
-
Job Title: AI/ML Data Pipeline Engineer
Role Summary: Design, develop, and maintain scalable, compliant data pipelines that feed AI/ML workloads, ensuring high‑quality ingestion, transformation, and feature engineering across heterogeneous data sources.
Expectations:
- Deliver end‑to‑end pipeline solutions within regulated environments, meeting data security, lineage, and compliance standards (e.g., FedRAMP, FISMA).
- Collaborate with data scientists, product owners, and DevOps to implement CI/CD best practices for data workflows.
- Lead troubleshooting and optimization of pipeline performance, scalability, and reliability.
Key Responsibilities:
- Architect and implement data ingestion, parsing, cleansing, and feature engineering for structured, unstructured, and graph data.
- Build and maintain Spark / Databricks jobs, orchestrated via workflow engines (Airflow, Dagster, etc.).
- Deploy pipelines to cloud platforms (AWS, Azure, GCP) with appropriate data storage, cataloging, and metadata management.
- Integrate regulatory datasets (e.g., OASIS, PREDICT, FDA imports, customs, supply chain) and build entity resolution, classification, and risk scoring logic.
- Enforce data security, lineage, and compliance policies, including audit logging and access controls.
- Monitor pipeline health, automate alerts, and implement continuous integration/continuous deployment pipelines.
- Document architecture, data flows, and operational procedures.
Required Skills:
- 5+ years in large‑scale data pipeline engineering for AI/ML.
- Advanced proficiency in Python, SQL, and Spark/Databricks.
- Experience with distributed processing, feature engineering, and data modeling.
- Cloud data platform expertise (AWS, Azure, or GCP) and CI/CD for data pipelines.
- Knowledge of regulated data environments (healthcare, government), data security, compliance, and privacy regulations.
- Familiarity with FDA, customs, supply chain data pipelines, entity resolution, and risk scoring.
Required Education & Certifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field.
- Certifications in AWS/Azure/GCP (e.g., AWS Certified Data Analytics, Google Professional Data Engineer) beneficial but not mandatory.
- Relevant certifications in data security/compliance (e.g., CSA-CISSP, FedRAMP, FISMA) preferred.