cover image
Chemify Limited

Senior Data Scientist - AI/ML (CADD)

On site

Glasgow, United kingdom

Senior

Full Time

10-11-2025

Share this job:

Skills

Communication Python Version Control Problem-solving Research Training Machine Learning PyTorch TensorFlow Deep Learning Databases git Redis Robotics

Job Specifications

About Chemify

Chemify is revolutionising chemistry. We are creating a future where the synthesis of previously unimaginable molecules, drugs, and materials is instantly accessible. By combining AI, robotics, and the world’s largest continually expanding database of chemical programs, we are accelerating chemical discovery to improve quality of life and extend the reach of humanity.

Job Description:

We are seeking a talented and motivated Senior AI/ML Data Scientist to pioneer the development and application of cutting-edge machine learning models for computer-aided drug design (CADD) and small molecule discovery.

You will be joining a dynamic, cross-disciplinary team of computational scientists, medicinal chemists, and engineers. Your primary focus will be on architecting, training, and deploying sophisticated models to predict molecular properties, generate novel models, and ultimately accelerate our drug discovery pipelines.

To be successful in this role, you will need deep expertise in modern machine learning, particularly generative AI (Transformers, Diffusion Models), Graph Neural Networks, and predictive modeling. You will leverage your skills to tackle complex scientific challenges, working with vast and diverse chemical and biological datasets.

If you are passionate about applying state-of-the-art AI to solve fundamental challenges in chemistry and are driven to see your work make a real-world impact on discovering new medicines, we’d love to have you join our team.

Key Responsibilities:

Design, develop, and optimize state-of-the-art generative models (eg, Transformers, GNNs, Diffusion Models) for robotic tasks synthetic routes

Architect and implement scalable MLOps pipelines for preprocessing large-scale chemical and biological datasets, model training, and rigorous evaluation

Translate cutting-edge research in AI/ML into practical solutions that address critical challenges in our drug discovery projects, such as property prediction (ADMET/QSAR), reaction prediction, and binding affinity prediction

Collaborate closely with computational chemists, medicinal chemists, and software engineers to define project goals, interpret model outputs, and integrate AI-driven insights into our discovery platform

Design and execute robust experiments to evaluate model performance, focusing on chemical validity, novelty, synthesizability, and predictive accuracy against experimental data

Clearly communicate complex technical concepts, model results, and strategic recommendations to both technical and non-technical stakeholders

Stay at the forefront of AI for drug discovery, foundation models for science, and multimodal learning, continuously identifying and championing opportunities to enhance our capabilities

What you’ll bring:

MSc or PhD with 5+ years of industry or academic experience in Computer Science, Machine Learning, Computational Chemistry/Biology, or a closely related field

Demonstrated proficiency in Python and deep learning frameworks such as PyTorch or TensorFlow

Deep theoretical and practical knowledge of modern machine learning architectures, including Transformers, Graph Neural Networks (GNNs), and generative models (VAEs, GANs, Diffusion Models) as applied to scientific problems

Proven ability to lead complex AI/ML projects from concept to deployment in a scientific or drug discovery context

Extensive experience working with large-scale molecular datasets (eg, SMILES, 3D conformations), biological data (eg, protein sequences, assay data), and other scientific data formats

Experience with efficient model training and fine-tuning techniques, such as LoRA, quantization, distillation, and model pruning

Strong background or hands-on experience applying ML to problems involving protein structures, small molecule interactions, or related biological data

Familiarity with scalable computing environments, GPU acceleration, and distributed training

Excellent communication and interpersonal skills for effective collaboration in a multidisciplinary team

A collaborative mindset, strong communication skills, and the ability to work effectively within a cross-disciplinary team

Excellent problem-solving skills and a proactive, can-do attitude

An eagerness to learn new scientific concepts, computational methods, and software engineering practices from experienced mentors

Good understanding of version control with Git

Beneficial Skills:

Hands-on experience with cheminformatics toolkits such as RDKit

Experience with Retrieval-Augmented Generation (RAG) systems, including vector databases (eg, Redis, FAISS, Milvus, Pinecone) for querying large chemical or biological databases

Experience with Protein/DNA language models (eg, ProtBERT, ESM, Evo) or protein structure prediction models (eg, AlphaFold-like approaches)

Experience with evaluation frameworks for reaction and synthetic route design, including human-in-the-loop assessment and metrics for novelty, diversity, and feasibility of sy

About the Company

Accelerating molecular discovery through the power of Chemputation. Know more