Job Specifications
Job Description:
Type of Requisition:
Regular
Clearance Level Must Currently Possess:
None
Clearance Level Must Be Able To Obtain:
None
Public Trust/Other Required:
None
Job Family:
Data Science and Data Engineering
Skills:
Job Qualifications:
Databricks DBRX, Databricks Platform, Data Lake, Python (Programming Language)
Certifications:
None
Experience:
5 + years of related experience
US Citizenship Required:
No
Job Description:
AI DATA ENGINEER SENIOR
Own your opportunity to turn data into measurable outcomes for our customers' most complex challenges. As a AI Data Engineer Senior at GDIT, you'll power innovation to drive mission impact and grow your expertise to power your career forward.
MEANINGFUL WORK AND PERSONAL IMPACT
As an AI Data Engineer Senior, the work you'll do at GDIT will be impactful to the mission of GSA. We are seeking a highly skilled and motivated Sr. AI Data Engineer with a proven track record in building scalable data platforms and pipelines, with demonstrated experience incorporating Generative AI into data engineering workflows. The ideal candidate will have deep expertise in Databricks data engineering capabilities including Delta Lake, data pipelines, and Unity Catalog, combined with innovative use of GenAI for enhancing data quality, metadata generation, and workflow automation. You will work collaboratively with data scientists, AI engineers, and analytics teams to design and implement robust data infrastructure that powers AI/ML initiatives. Additionally, you will play a key role in establishing data engineering best practices and mentoring team members in modern data platform technologies.
What You'll Need To Succeed
Bring your expertise and drive for innovation to GDIT. The AI Data Engineer Senior must have:
Education: Bachelor of Science
Experience: 5+ years of related experience
Technical skills: Databricks Data Engineering, Delta Lake, GenAI-Enhanced Workflows, Python, PySpark, AWS
Responsibilities:
Design, build, and maintain scalable data pipelines and ETL/ELT workflows using Databricks and PySpark for AI/ML and analytics workloads
Leverage Databricks core data capabilities including Delta Lake, Delta Live Tables, and Databricks Workflows to create reliable, high-performance data platforms
Implement GenAI-enhanced data workflows for automated metadata generation, data cataloging, data quality validation, and intelligent data profiling
Utilize LLMs to generate documentation, create data dictionaries, and automate schema inference and data lineage tracking
Design and implement medallion architecture (Bronze, Silver, Gold layers) following data lakehouse best practices
Collaborate with data architects to establish data modeling standards, governance policies, and data quality frameworks
Integrate AWS data services (S3, Glue, Kinesis, MSK, Redshift) with Databricks to build end-to-end data solutions
Leverage and integrate into Unity Catalog or other data catalogs/access management tools in the enterprise for data governance, access control, and data asset management across the platform
Optimize data pipeline performance through partitioning strategies, caching, and query optimization techniques
Establish DataOps and MLOps practices including version control, CI/CD for data pipelines, and automated testing
Create reusable data transformation frameworks and libraries to accelerate data pipeline development
Collaborate with AI/ML teams to prepare, curate, and serve high-quality datasets for model training and inference
Implement real-time and batch data processing architectures to support diverse analytics and AI use cases
Stay current with emerging data engineering technologies, GenAI capabilities, and Databricks platform enhancements
Document data architectures, pipeline designs, and operational procedures for knowledge sharing and compliance
Required Skills
5+ years of proven experience as a Data Engineer with focus on building large-scale data platforms and pipelines
3+ years of hands-on experience with Databricks platform, specifically data engineering features (Delta Lake, DLT, Workflows, Unity Catalog)
2+ years of experience incorporating Generative AI into data engineering workflows (metadata generation, data quality, documentation)
5+ years of strong proficiency in Python and PySpark for distributed data processing
3+ years of experience with AWS data services (S3, Glue, Lambda, Kinesis, Redshift, Athena)
Deep understanding of data lakehouse architecture, Delta Lake ACID transactions, and time travel capabilities
Proven experience with SQL optimization, data modeling, and dimensional modeling techniques
Strong knowledge of data orchestration tools and workflow management (Airflow, Databricks Workflows)
Experience implementing data quality frameworks and validation rules at scale
Understanding of data governance, data lineage, and metadata management principles
Excellent problem-solving skills with ability to debug complex data pipeline issues
About the Company
GDIT is a global technology and professional services company that delivers solutions, technology and mission services to every major agency across the U.S. government, defense and intelligence community.
Our 30,000 experts extract the power of technology to create immediate value and deliver solutions at the edge of innovation. We operate across 50+ countries worldwide, offering leading capabilities in digital modernization, AI/ML, Cloud, Cyber and application development.
GDIT is part of General Dynamics, a global aeros...
Know more