- Company Name
- Turnitin
- Job Title
- Sr. AI Data Engineer (UK Remote)
- Job Description
-
Job Title: Sr. AI Data Engineer
Role Summary
Design, build, and operate scalable, real‑time data pipelines that feed Applied AI model training and deployment cycles. Own the AI data infrastructure, from ingestion and normalisation to storage and retrieval, ensuring high‑quality datasets for LLM fine‑tuning, RAG systems, and other AI workloads. Collaborate with AI R&D, product, and data platform teams to align data strategy with business goals and continuously improve tooling, automation, and performance.
Expactations
* Minimum 4 years’ experience in data engineering focused on AI/ML pipeline development.
* Proficient in Python, SQL, Infrastructure as Code (Terraform / CloudFormation), and orchestration tools (Airflow, Prefect, or dbt).
* Hands‑on with cloud data platforms (AWS, Azure, GCP) and vector databases (Pinecone, Weaviate, Qdrant, Chroma).
* Experience with MLOps stacks (SageMaker, Vertex AI, HuggingFace), experiment tracking (MLflow, Weights & Biases), and model deployment pipelines.
* Working knowledge of Large‑Language‑Model workflows: embedding generation, retrieval‑augmented generation, and LLM orchestration frameworks (LangChain, LangFuse, LiteLLM, LlamaIndex).
* Strong analytical, problem‑solving, and communication skills, with the ability to translate technical concepts to cross‑functional stakeholders.
Desired (not required)
* 6+ years data‑engineering experience in AI/ML contexts, including technical leadership or mentorship.
* Background in education or EdTech sectors.
* Familiarity with AI coding assistants (GitHub Copilot, Claude) and data visualisation tools (Streamlit).
* Additional knowledge in NLP, CV, multimodal AI, or advanced analytics.
Key Responsibilities
* Architect and maintain scalable real‑time data pipelines for AI model training and inference.
* Collect, normalise, and persist data from diverse sources, including external LLM providers.
* Deploy and manage robust data infrastructure, ensuring reliability, security, and cost efficiency.
* Partner with AI R&D, Applied AI, and Data Platform teams to ensure data quality, accessibility, and compliance with standards.
* Support exploratory data initiatives that uncover insights to improve AI algorithms and business outcomes.
* Communicate architecture decisions, status updates, and innovation opportunities across teams.
* Evaluate emerging tools and methodologies; recommend improvements to AI data workflows and stack.
Required Skills
* Python, SQL, IaC (Terraform, CloudFormation).
* Orchestration: Airflow, Prefect, or dbt.
* Cloud platforms: AWS, Azure, GCP.
* Vector database systems: Pinecone, Weaviate, Qdrant, Chroma.
* MLOps: SageMaker, Vertex AI, HuggingFace, MLflow, Weights & Biases.
* LLM technologies: embedding generation, RAG, LangChain, LiteLLM, LangFuse, LlamaIndex.
* Strong problem‑solving, analytical, and written/written communication.
Required Education & Certifications
* Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field (preferred).
* Optional certifications: AWS Certified Data Analytics – Specialty, Google Cloud Professional Data Engineer, Azure Data Engineer Associate, or relevant MLOps certifications.
Newcastle upon tyne, United kingdom
Remote
Mid level
30-01-2026