- Company Name
- Flagship Pioneering
- Job Title
- Machine Learning Scientist - Small Molecule Binding
- Job Description
-
**Job Title:**
Machine Learning Scientist – Small Molecule Binding
**Role Summary:**
Develop and deploy physics‑informed, geometry‑aware ML models for protein‑ligand docking, pose prediction, and binding affinity estimation. Partner with computational chemists, structural biologists, and assay teams to enable high‑precision virtual screening, generative ligand design, and closed‑loop drug discovery workflows.
**Expectations:**
- Deliver end‑to‑end ML pipelines from data curation to production deployment.
- Create robust, uncertainty‑aware models that operate on sparse or noisy assay data.
- Ensure rigorous, leakage‑free benchmarking and cross‑target generalization.
- Collaborate cross‑functionally to translate experimental targets into actionable ML solutions.
**Key Responsibilities:**
1. Design SE(3)‑equivariant, graph/point‑cloud/3D‑grid models for docking and scoring, handling metalloproteins, covalent chemistries, and allosteric sites.
2. Build pose‑ranking and ΔG prediction models using physics‑based objectives, ensembles, and energy constraints.
3. Develop diffusion/flow/RL‑based generative methods for pocket‑conditioned ligand design and constrained optimization.
4. Incorporate protein flexibility via receptor ensembles, side‑chain sampling, and induced‑fit strategies; integrate with MD, MM/GBSA, and free‑energy calculations as needed.
5. Model realistic chemistry (protonation, tautomerism, stereochemistry, water networks, metal coordination) in training and inference pipelines.
6. Implement calibrated uncertainty estimates and acquisition strategies for ultra‑large virtual screens.
7. Curate high‑quality datasets (e.g., PDBbind, CrossDocked, BindingDB, ChEMBL) with strict leakage control; define benchmark metrics (RMSD, success@k, CASF).
8. Deploy scalable GPU‑accelerated screening services and APIs; monitor model drift and maintain production reliability.
9. Communicate results and roadmap to R&D leadership, product, and lab automation teams.
**Required Skills:**
- Advanced Python programming; expert in PyTorch, JAX, or TensorFlow.
- Deep expertise in geometric deep learning for 3D molecular data (e3nn, PyTorch Geometric/DGL).
- Strong knowledge of protein‑ligand thermodynamics, docking, and scoring fundamentals.
- Proficiency with cheminformatics and structural biology tools: RDKit, OpenMM/MDTraj/ParmEd, OpenFF, BioPython.
- Experience with docking engines (AutoDock Vina/Smina/Gnina, Rosetta/PyRosetta) and free‑energy methods (MM/GBSA, FEP/TI).
- Ability to design rigorous evaluation protocols and construct leakage‑safe data splits.
- Experience managing full ML lifecycle on cloud or HPC environments (data pipelines, training, serving, monitoring).
- Excellent communication, self‑start attitude, and attention to detail.
**Required Education & Certifications:**
- Ph.D. in Computer Science, Computational Chemistry, Chemistry, Biophysics, or a related discipline *or* equivalent industry experience with a strong publication or project record.
- Demonstrated track record of peer‑reviewed publications in top ML or computational chemistry venues (preferred).