Job Specifications
Rate: $80-$90/hr
Role Overview
The Data Acquisition & MDM Lead is responsible for driving the strategy, architecture, and delivery of enterprise‑grade data ingestion, master data management, and governance capabilities across cloud and on‑premise ecosystems. This role combines deep expertise in Azure Databricks, Apache Kafka, Collibra, Cloudera, and granular computing techniques to build scalable, high‑quality, and governed data assets that support analytics, operational reporting, risk, and business transformation initiatives.
Key Responsibilities
1. Data Acquisition & Integration
Lead the design and implementation of enterprise data ingestion frameworks using Azure Data Factory, Databricks, Kafka, and Cloudera ingestion pipelines.
Architect and optimize batch, micro‑batch, and real‑time streaming data flows aligned with business, compliance, and performance needs.
Define patterns for data onboarding, schema management, partitioning, and data quality enforcement.
2. Master Data Management (MDM)
Drive the roadmap, architecture, and delivery of enterprise MDM capabilities across customer, product, and reference data domains.
Oversee MDM data modeling, survivorship rules, golden record creation, matching/merging logic, and data mastering workflows.
Partner with business stewards to define data standards, consumption models, and lineage requirements.
3. Azure Databricks & Big Data Processing
Architect and manage distributed compute workloads using Azure Databricks (Spark, Python, SQL) for large-scale data processing, cleansing, and enrichment.
Optimize workflows through Delta Lake, Z‑ordering, granular computing techniques, and performance tuning of Spark jobs.
Maintain reusable data engineering frameworks and libraries.
4. Data Governance & Collibra Oversight
Implement and enforce Collibra-based governance, including metadata management, glossary creation, stewardship workflows, and lineage mapping.
Establish and socialize governance standards for data classification, data quality, and lifecycle management.
Partner with risk and compliance to ensure regulatory adherence (e.g., AML, fraud, privacy, retention).
5. Cloudera Platform Management
Lead modernization and migration initiatives from Cloudera/Hadoop ecosystems to cloud-native platforms.
Maintain and optimize workloads across HDFS, Hive, Impala, Spark, and Oozie.
Ensure secure, performant, and cost‑efficient platform operations.
6. Architecture, Strategy & Leadership
Define end‑to‑end architecture for data acquisition, mastering, and governance within the enterprise data platform.
Provide guidance to engineers, analysts, and business partners on best practices, patterns, and solution options.
Lead design reviews, roadmap planning, POCs, platform upgrades, and innovation initiatives leveraging granular computing and distributed data architectures.
7. Stakeholder Engagement
Work closely with product owners, business SMEs, data stewards, enterprise architects, cloud engineering, and security teams.
Facilitate requirements workshops, solution walkthroughs, and technical planning sessions.
Serve as a trusted SME for data acquisition, MDM, governance, and platform capabilities.
Qualifications
Must-Have
8+ years of experience in enterprise data engineering, MDM, or data platform roles.
Strong expertise in:
Azure Databricks (Spark, Python/Scala/SQL)
Apache Kafka (streams, brokers, schema registry)
Collibra governance
Cloudera ecosystem (HDFS, Hive, Impala, Spark)
Proven experience architecting large‑scale data pipelines, ingestion frameworks, and MDM workflows.
Understanding of granular computing concepts (multi-level data processing, coarse/fine-grain architecture, and optimization).
Solid understanding of data modeling, schema design, metadata, lineage, and data quality frameworks.
Interested? Please share updated CV
*AI may be used to screen, assess or select applicants for the position*
*This posting is for an existing vacancy with the organization.*