- Company Name
- Evolv Technology
- Job Title
- Sr. Data Infrastructure Engineer
- Job Description
-
Job Title: Senior Data Infrastructure Engineer
Role Summary: Architect, build, and maintain end‑to‑end data pipelines for AI/ML research and production, spanning edge devices, cloud ingestion, and centralized data platforms, ensuring scalability, reliability, security, and data governance.
Expectations:
- First 30 days: Gain deep understanding of existing edge‑to‑cloud pipelines, assess reliability & scalability, build relationships with AI/ML and field teams, prototype data processing pipelines.
- First 3 months: Design and implement improved ingestion, validation, and processing pipelines on AWS (S3, EC2, Lambda, Glue, Step Functions, SageMaker); introduce automated data quality checks and model evaluation workflows; partner with field ops to enhance data coverage.
- First year: Own mission‑critical data lifecycles, architect scalable edge‑to‑cloud systems for millions of devices, define and enforce data governance (retention, access control, lineage), enable rapid ML experimentation with high‑quality, versioned datasets.
Key Responsibilities:
- Design, build, and maintain research and production data pipelines across edge devices and cloud services.
- Own full data lifecycle: collection, ingestion, processing, obfuscation, versioning, access, retention, retirement.
- Create resilient ingestion paths that tolerate variable connectivity and device heterogeneity.
- Implement privacy‑preserving transformations, data cleaning, deduplication, and automated validation.
- Establish data lineage, retention policies, and access controls for compliance.
- Provide scalable data services for model training, evaluation, and continuous refresh.
- Integrate with labeling/annotation workflows and support large‑scale ML workloads.
- Optimize pipelines for cost, performance, and reliability using AWS services (S3, EC2, SageMaker, Lambda, Glue, Step Functions).
- Collaborate with AI/ML engineers, data scientists, and field ops to translate requirements and feedback into automated pipeline improvements.
- Scale the data factory globally across millions of devices and maintain flexibility for research needs.
Required Skills:
- Proficiency in Python and C++; experience with distributed data processing frameworks (e.g., Spark, Beam).
- Hands‑on with AWS services: S3, EC2, Lambda, Glue, Step Functions, SageMaker.
- Knowledge of data ingestion, validation, cleaning, obfuscation, versioning, and governance practices.
- Ability to design scalable, resilient, and secure data pipelines across edge and cloud.
- Strong problem‑solving, documentation, and cross‑functional collaboration.
Required Education & Certifications:
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, Software Engineering, or related field.
- 2–3+ years of experience building production data pipelines that support AI/ML models.
- Relevant certifications (e.g., AWS Certified Data Analytics, AWS Certified Solutions Architect) preferred but not mandatory.