- Company Name
- MasterControl
- Job Title
- Data Engineering Intern
- Job Description
-
**Job Title:** Data Engineering Intern
**Role Summary:**
Support the development of a next‑generation data platform that utilizes AI/ML to optimize quality and compliance workflows in regulated industries. Build, test, and optimize large‑scale batch and streaming data pipelines, design scalable data models, and contribute to end‑to‑end software delivery.
**Expectations:**
- Design, develop, and deploy distributed data pipelines using Hadoop, Spark, Flink, Kafka, or equivalent.
- Implement real‑time stream‑processing applications with Apache Flink, Spark‑Streaming, Kafka Streams, or similar.
- Model data in star, snowflake, normalized, and denormalized structures and optimize for random and sequential access.
- Work with NoSQL databases (Neo4j, MongoDB, Cassandra, HBase, DynamoDB, BigTable, etc.).
- Participate fully in the Software Development Life Cycle: design, code, test, and release.
- Self‑directed, learn new technologies rapidly, communicate effectively, and deliver pragmatic solutions.
**Key Responsibilities:**
- Build and maintain data pipelines for batch and streaming workloads on distributed platforms.
- Develop and optimize stream‑processing applications to meet performance and latency targets.
- Design and implement scalable data models and related storage solutions.
- Manage data ingestion, transformation, and integration across the platform.
- Collaborate with cross‑functional teams (data science, product, QA) to define requirements and deliver solutions.
- Participate in code reviews, unit testing, and continuous integration workflows.
- Document data architecture, pipeline configurations, and operational procedures.
**Required Skills:**
- Proficiency in Java, Scala, or Python for distributed data engineering.
- Hands‑on experience with Hadoop, Spark, Flink, Kafka, or equivalent.
- Building stream‑processing applications using Flink, Spark‑Streaming, Kafka Streams, or similar.
- Data modeling expertise: star, snowflake, normalized/denormalized schemas, bucketing, sharding, aggregation.
- Familiarity with NoSQL databases (Neo4j, MongoDB, Cassandra, HBase, DynamoDB, BigTable).
- Full SDLC experience: design, development, testing, release, and maintenance.
- Strong analytical, problem‑solving, and communication skills.
- Ability to work independently in ambiguous environments and prioritize pragmatic solutions.
**Required Education & Certifications:**
- Minimum 2 years of data engineering experience (internship acceptable).
- Enrollment at Northeastern University (Co‑op program eligibility) or similar higher‑education institution.
- Bachelor’s degree or coursework in Computer Science, Data Engineering, or a related field (preferable).
Salt lake city, United states
On site
Fresher
17-11-2025