Skills

Python Unity SQL Data Governance Data Engineering CI/CD Version Control Problem-solving Training Architecture Programming AWS Analytics Data Science Databricks PySpark

Job Specifications

Job Description:

Type of Requisition:

Regular

Clearance Level Must Currently Possess:

None

Clearance Level Must Be Able To Obtain:

None

Public Trust/Other Required:

None

Job Family:

Data Science and Data Engineering

Skills:

Job Qualifications:

Databricks DBRX, Databricks Platform, Data Lake, Python (Programming Language)

Certifications:

None

Experience:

5 + years of related experience

US Citizenship Required:

No

Job Description:

AI DATA ENGINEER SENIOR

Own your opportunity to turn data into measurable outcomes for our customers' most complex challenges. As a AI Data Engineer Senior at GDIT, you'll power innovation to drive mission impact and grow your expertise to power your career forward.

MEANINGFUL WORK AND PERSONAL IMPACT

As an AI Data Engineer Senior, the work you'll do at GDIT will be impactful to the mission of GSA. We are seeking a highly skilled and motivated Sr. AI Data Engineer with a proven track record in building scalable data platforms and pipelines, with demonstrated experience incorporating Generative AI into data engineering workflows. The ideal candidate will have deep expertise in Databricks data engineering capabilities including Delta Lake, data pipelines, and Unity Catalog, combined with innovative use of GenAI for enhancing data quality, metadata generation, and workflow automation. You will work collaboratively with data scientists, AI engineers, and analytics teams to design and implement robust data infrastructure that powers AI/ML initiatives. Additionally, you will play a key role in establishing data engineering best practices and mentoring team members in modern data platform technologies.

What You'll Need To Succeed

Bring your expertise and drive for innovation to GDIT. The AI Data Engineer Senior must have:

Education: Bachelor of Science
Experience: 5+ years of related experience
Technical skills: Databricks Data Engineering, Delta Lake, GenAI-Enhanced Workflows, Python, PySpark, AWS
Responsibilities:
Design, build, and maintain scalable data pipelines and ETL/ELT workflows using Databricks and PySpark for AI/ML and analytics workloads
Leverage Databricks core data capabilities including Delta Lake, Delta Live Tables, and Databricks Workflows to create reliable, high-performance data platforms
Implement GenAI-enhanced data workflows for automated metadata generation, data cataloging, data quality validation, and intelligent data profiling
Utilize LLMs to generate documentation, create data dictionaries, and automate schema inference and data lineage tracking
Design and implement medallion architecture (Bronze, Silver, Gold layers) following data lakehouse best practices
Collaborate with data architects to establish data modeling standards, governance policies, and data quality frameworks
Integrate AWS data services (S3, Glue, Kinesis, MSK, Redshift) with Databricks to build end-to-end data solutions
Leverage and integrate into Unity Catalog or other data catalogs/access management tools in the enterprise for data governance, access control, and data asset management across the platform
Optimize data pipeline performance through partitioning strategies, caching, and query optimization techniques
Establish DataOps and MLOps practices including version control, CI/CD for data pipelines, and automated testing
Create reusable data transformation frameworks and libraries to accelerate data pipeline development
Collaborate with AI/ML teams to prepare, curate, and serve high-quality datasets for model training and inference
Implement real-time and batch data processing architectures to support diverse analytics and AI use cases
Stay current with emerging data engineering technologies, GenAI capabilities, and Databricks platform enhancements
Document data architectures, pipeline designs, and operational procedures for knowledge sharing and compliance
Required Skills
5+ years of proven experience as a Data Engineer with focus on building large-scale data platforms and pipelines
3+ years of hands-on experience with Databricks platform, specifically data engineering features (Delta Lake, DLT, Workflows, Unity Catalog)
2+ years of experience incorporating Generative AI into data engineering workflows (metadata generation, data quality, documentation)
5+ years of strong proficiency in Python and PySpark for distributed data processing
3+ years of experience with AWS data services (S3, Glue, Lambda, Kinesis, Redshift, Athena)
Deep understanding of data lakehouse architecture, Delta Lake ACID transactions, and time travel capabilities
Proven experience with SQL optimization, data modeling, and dimensional modeling techniques
Strong knowledge of data orchestration tools and workflow management (Airflow, Databricks Workflows)
Experience implementing data quality frameworks and validation rules at scale
Understanding of data governance, data lineage, and metadata management principles
Excellent problem-solving skills with ability to debug complex data pipeline issues

About the Company

GDIT is a global technology and professional services company that delivers solutions, technology and mission services to every major agency across the U.S. government, defense and intelligence community. Our 30,000 experts extract the power of technology to create immediate value and deliver solutions at the edge of innovation. We operate across 50+ countries worldwide, offering leading capabilities in digital modernization, AI/ML, Cloud, Cyber and application development. GDIT is part of General Dynamics, a global aeros... Know more