- Company Name
- Pyramid Consulting, Inc
- Job Title
- Data SDET Engineer
- Job Description
-
**Job Title:** Data SDET Engineer
**Role Summary:**
Design, develop, and maintain automated data validation frameworks for large‑scale ETL/ELT pipelines. Ensure data quality, integrity, and performance across ingestion, transformation, and consumption layers while collaborating with data engineering teams and integrating tests into CI/CD pipelines.
**Expectations:**
- 5+ years of experience in data testing, automation, or backend/data‑heavy systems.
- Strong Python programming and advanced SQL proficiency.
- Ability to own end‑to‑end data quality and work effectively with cross‑functional teams.
- Proactive learning mindset toward deeper data engineering concepts.
- Excellent communication skills for client‑facing interactions.
**Key Responsibilities:**
- Design and implement automated validation frameworks for ETL/ELT pipelines.
- Perform source‑to‑target reconciliation, regression, and smoke testing on large datasets.
- Develop Python test automation using PyTest (or similar) and integrate into CI/CD tools (GitHub Actions, Jenkins, GitLab CI).
- Validate pipelines built with Apache Airflow, Spark/PySpark, SQL transformations, and cloud‑native ETL solutions (AWS, Azure, GCP).
- Log, analyze, and debug data issues in distributed systems; ensure performance and scalability.
- Collaborate with Data Engineers on pipeline design, optimization, data modeling, and schema design.
- Contribute reusable utilities for data profiling, validation, and monitoring.
- Support API and data service testing (REST) and exposure to streaming platforms (Kafka, Kinesis).
**Required Skills:**
- Python (OOP, data handling, automation frameworks)
- Advanced SQL (complex joins, window functions, tuning)
- ETL/Data pipeline testing experience with large datasets (millions+ records)
- Apache Spark / PySpark or Hadoop ecosystem
- Cloud data warehouses: Snowflake, Redshift, BigQuery, Synapse (any)
- Data formats: Parquet, Avro, JSON, CSV
- Test frameworks: PyTest, unittest (or equivalent)
- CI/CD tools: GitHub Actions, Jenkins, GitLab CI (any)
- Git version control
- Basic Linux/Unix shell scripting
- Cloud platforms: AWS, Azure, or GCP (any) – Databricks a plus
- Experience with data modeling (Kimball/dimensional) and data governance/quality tools
- Familiarity with real‑time/streaming data and pipeline monitoring
**Required Education & Certifications:**
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- SDET or Quality Engineering certifications (preferred but not mandatory).
Pittsburgh, United states
Hybrid
Mid level
27-03-2026