- Company Name
- CRG
- Job Title
- Lead Support Engineer
- Job Description
-
Job title: Lead Support Engineer
Role Summary:
Lead the stability and performance of a large enterprise data platform by managing high‑severity incidents, optimizing data workflows, and collaborating with engineering teams to deliver reliable data services.
Expectations:
Deliver timely incident resolution, maintain SLA compliance, drive continuous improvement of data pipelines, and mentor junior support staff while ensuring cross‑functional alignment on backlog priorities.
Key Responsibilities:
- Resolve L3+ incidents and translate recurrence into backlog items.
- Link incidents, bugs, and user stories in backlog management tools.
- Validate bug fixes and reduce deployments through testing in lower environments.
- Prioritize business‑critical tasks and communicate backlog health.
- Investigate Snowflake table/view refresh failures, analyze logs, and optimize queries.
- Re‑run Airflow DAGs, dbt models, and manual triggers for recovery.
- Confirm ingestion success via Fivetran connectors and validate loads with SQL.
- Review and manually trigger failed ETL/ELT jobs; resolve scheduling or sequencing issues.
- Maintain and enhance runbooks, SOPs, and knowledge base documentation.
- Participate in release readiness, deployments, and post‑release validation.
- Mentor and coach junior engineers on troubleshooting and incident‑management best practices.
- Advocate for customer impact in sprint planning and workload prioritization.
Required Skills:
- 5+ years in application support, production support, or data platform operations.
- Deep troubleshooting of data‑platform workflows.
- Strong RCA and analytical skills.
- Proficient with Snowflake (querying, optimization, task scheduling).
- Advanced SQL writing, debugging, and optimization.
- Airflow (DAG management, retries, scheduling).
- dbt (model execution, debugging, project structure).
- Fivetran (connector monitoring, log review, manual refresh).
- Python (ETL/Lambda script reading/modifying, event‑driven flows).
- AWS Lambda (log review, execution debugging).
- ETL/ELT lifecycle understanding.
- Monitoring tools: Splunk, Dynatrace, Zabbix, AlertBot.
- Azure DevOps (backlog and release management).
- ITIL knowledge (incident, problem, change).
- Excellent communication and collaboration with cross‑functional teams.
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Information Technology, or related field, or equivalent experience.
- Certifications: ITIL Intermediate/Expert, Splunk Power User, Dynatrace Associate, Certified Problem Manager (recommended).