Job Specifications
At Microsoft AI, we are inventing an AI Companion for everyone – an AI designed with real personality and emotional intelligence that’s always in your corner. Defined by effortless communication, extraordinary capabilities, and a new level of connection and support, we want Copilot to define the next wave of technology. This is a rare opportunity to be a part of a team crafting something that challenges everything we know about software and consumer products.
Our health team is on a mission to help millions of users better understand and proactively manage their health and wellbeing. We’re responsible for ensuring that Microsoft AI’s models and services are useful, trusted and safe across diverse customer health journeys.
We’re looking for a deeply technical and mission-driven Data Science Lead to build the data foundations powering our health AI companion. You’ll architect, scale, and optimize the pipelines, datasets, and metrics frameworks that help us understand user behavior, evaluate model performance, and measure health impact. This role sits at the intersection of engineering, analytics, and applied AI—translating raw signals into insights that shape product decisions and ensure our systems are safe, effective, and grounded in evidence.
You’ll partner closely with product, model, and clinical teams to define data models, build robust ETL workflows, and enable a high-quality analytics environment that supports experimentation, evaluation, and decision-making at scale.
Key Responsibilities
Design, build, and maintain high-quality data pipelines and models that power analytics, dashboards, and product experimentation across health AI experiences
Develop and optimize scalable ELT/ETL processes to extract data from multiple structured and unstructured sources (including telemetry, model outputs, and healthcare data integrations)
Partner with product and clinical counterparts to define source-of-truth datasets and standardized metrics for user engagement, safety, and health outcome evaluation
Implement monitoring, validation, and alerting systems to ensure data reliability, lineage, and reproducibility across the analytics stack
Collaborate with ML engineers and model evaluation teams to operationalize evaluation pipelines—supporting automated scoring, HealthBench metrics, and experiment tracking
Define and maintain data schemas, transformation logic, and documentation to promote transparency and reusability across teams
Drive continuous improvement in data quality, discoverability, and observability
Contribute to shaping data infrastructure strategy and tooling to support next-generation health AI systems
Required Qualifications
Bachelor’s or Master’s degree in Computer Science, Data Engineering, Data Science, or related field, OR similar experience.
Experience with scaled consumer products
Experience building and maintaining production-grade data pipelines, warehouses, and analytics platforms
Strong proficiency with SQL and modern data-stack technologies (e.g., dbt, Airflow, Databricks, BigQuery, Snowflake, Spark, or similar)
Experience designing efficient data models and ETL processes supporting analytical workloads and experimentation
Proven ability to translate ambiguous data needs into scalable engineering solutions
Familiarity with data governance, schema design, and principles of data privacy and compliance (HIPAA, de-identification, PHI handling)
Experience working with Python for data processing, analytics, or pipeline orchestration
Preferred Qualifications
Experience working in healthcare, digital health, or regulated data environments
Exposure to large language model (LLM) or generative AI systems, particularly in analytics or evaluation contexts
Strong understanding of experiment design, metrics definition, and instrumentation in AI-driven products
Familiarity with tools for workflow orchestration, version control, and CI/CD (e.g., Airflow, Dagster, GitHub Actions)
Comfort collaborating cross-functionally with product, analytics, and clinical teams in a fast-paced environment
Curiosity about how AI systems can responsibly improve access to care and health outcomes