Skills

Communication Emotional Intelligence Python SQL Data Governance Data Engineering GitHub CI/CD Monitoring Version Control Decision-making Analytics Snowflake Data Science Spark Databricks ETL Processes GitHub Actions

Job Specifications

At Microsoft AI, we are inventing an AI Companion for everyone – an AI designed with real personality and emotional intelligence that’s always in your corner. Defined by effortless communication, extraordinary capabilities, and a new level of connection and support, we want Copilot to define the next wave of technology. This is a rare opportunity to be a part of a team crafting something that challenges everything we know about software and consumer products.

Our health team is on a mission to help millions of users better understand and proactively manage their health and wellbeing. We’re responsible for ensuring that Microsoft AI’s models and services are useful, trusted and safe across diverse customer health journeys.

We’re looking for a deeply technical and mission-driven Data Science Lead to build the data foundations powering our health AI companion. You’ll architect, scale, and optimize the pipelines, datasets, and metrics frameworks that help us understand user behavior, evaluate model performance, and measure health impact. This role sits at the intersection of engineering, analytics, and applied AI—translating raw signals into insights that shape product decisions and ensure our systems are safe, effective, and grounded in evidence.

You’ll partner closely with product, model, and clinical teams to define data models, build robust ETL workflows, and enable a high-quality analytics environment that supports experimentation, evaluation, and decision-making at scale.

Key Responsibilities

Design, build, and maintain high-quality data pipelines and models that power analytics, dashboards, and product experimentation across health AI experiences
Develop and optimize scalable ELT/ETL processes to extract data from multiple structured and unstructured sources (including telemetry, model outputs, and healthcare data integrations)
Partner with product and clinical counterparts to define source-of-truth datasets and standardized metrics for user engagement, safety, and health outcome evaluation
Implement monitoring, validation, and alerting systems to ensure data reliability, lineage, and reproducibility across the analytics stack
Collaborate with ML engineers and model evaluation teams to operationalize evaluation pipelines—supporting automated scoring, HealthBench metrics, and experiment tracking
Define and maintain data schemas, transformation logic, and documentation to promote transparency and reusability across teams
Drive continuous improvement in data quality, discoverability, and observability
Contribute to shaping data infrastructure strategy and tooling to support next-generation health AI systems

Required Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, Data Science, or related field, OR similar experience.
Experience with scaled consumer products
Experience building and maintaining production-grade data pipelines, warehouses, and analytics platforms
Strong proficiency with SQL and modern data-stack technologies (e.g., dbt, Airflow, Databricks, BigQuery, Snowflake, Spark, or similar)
Experience designing efficient data models and ETL processes supporting analytical workloads and experimentation
Proven ability to translate ambiguous data needs into scalable engineering solutions
Familiarity with data governance, schema design, and principles of data privacy and compliance (HIPAA, de-identification, PHI handling)
Experience working with Python for data processing, analytics, or pipeline orchestration

Preferred Qualifications

Experience working in healthcare, digital health, or regulated data environments
Exposure to large language model (LLM) or generative AI systems, particularly in analytics or evaluation contexts
Strong understanding of experiment design, metrics definition, and instrumentation in AI-driven products
Familiarity with tools for workflow orchestration, version control, and CI/CD (e.g., Airflow, Dagster, GitHub Actions)
Comfort collaborating cross-functionally with product, analytics, and clinical teams in a fast-paced environment
Curiosity about how AI systems can responsibly improve access to care and health outcomes

About the Company

Every company has a mission. What's ours? To empower every person and every organization to achieve more. We believe technology can and should be a force for good and that meaningful innovation contributes to a brighter world in the future and today. Our culture doesn’t just encourage curiosity; it embraces it. Each day we make progress together by showing up as our authentic selves. We show up with a learn-it-all mentality. We show up cheering on others, knowing their success doesn't diminish our own. We show up every day o... Know more