cover image
Storm3

Storm3

www.storm3.com

1 Job

68 Employees

About the Company

Storm3 are specialists in US HealthTech recruitment, connecting organisations with the talent to drive their mission. Launched in 2020 to service the HealthTech industry, Storm3 connect senior talent with businesses at the forefront of healthcare technology innovation. Storm3 focus on placing talent into start up and scale ups across the United States. The pandemic has seen the HealthTech industry skyrocket with the uptake of new digital technologies, big data analytics and sophisticated AI. From genomics, telemedicine, FemTech, surgical robotics through to smart devices and apps focused on our physical and mental health; investment into HealthTech continues to soar globally. Storm3’s mission is to be integral to the digital revolution and provide highly specialised teams across Data & Analytics, Engineering, Product Management and Sales & Marketing. We are a leading provider of HealthTech-focused information to clients for market compensation and best practice in diversity, equity & inclusion, hiring and retention. Storm3 is a trading division of Levin.

Listed Jobs

Company background Company brand
Company Name
Storm3
Job Title
Research Scientist - Data
Job Description
Job title: Research Scientist – Data Role Summary: Lead data‑centric research on foundation models, designing large‑scale training corpora, developing automated data pipelines, and creating evaluation frameworks to enhance LLM robustness, scalability, and reasoning. Collaborate with researchers, data scientists, and engineers to publish findings in top AI venues and contribute to open‑source tooling. Expactations: Deliver independent research on data quality, scaling, and reasoning; publish regularly in leading conferences; contribute to open‑source datasets and benchmarks; maintain high standards of data curation and pipeline reproducibility; engage with external research community and conferences. Key Responsibilities - Design and lead research on data‑centric approaches for LLMs (pretraining corpus, data valuation, speculative decoding). - Build and optimize agentic data pipelines (retrieval, self‑curation, multi‑agent feedback). - Develop scalable data preprocessing and curation pipelines for heterogeneous sources. - Prototype and deploy evaluation frameworks assessing data quality, coverage, and downstream LLM reasoning impact. - Collaborate with alignment and reasoning researchers to integrate data‑driven methods. - Publish studies at NeurIPS, ICLR, ACL, EMNLP, etc.; represent institute at conferences. - Contribute tools, datasets, and benchmarks to the open‑source foundation model community. Required Skills - Master’s in CS, Data Science, or related field (PhD preferred). - Proven experience with large‑scale text data collection, multi‑lingual curation, and preprocessing for ML/LLM training. - Hands‑on expertise in scalable ML infrastructure for training, evaluation, and debugging. - Strong background in data engineering, tokenization, and training tokenizers. - Experience with RL/SFT, post‑training, retrieval‑augmented generation, or agentic data pipelines. - Ability to lead independent research projects and produce high‑impact publications. - Familiarity with knowledge graphs, semantic search, indexing, and speculative decoding concepts. Required Education & Certifications - Master's (B.Sc. acceptable with extensive experience) in Computer Science, Data Science, Machine Learning, or a related technical discipline; Ph.D. strongly preferred. - No mandatory certifications, but demonstrable contributions to open‑source ML data tools or benchmarks are highly valued.
San francisco bay, United states
Hybrid
24-11-2025