- Company Name
- Oxford Data Plan
- Job Title
- AI Systems Engineer
- Job Description
-
**Job title**
AI Systems Engineer
**Role Summary**
Design, build, and operate end‑to‑end AI/LLM systems and microservices on AWS. Develop autonomous agents, RAG pipelines, and internal tools that support data science, product, engineering, and revenue teams, ensuring reliability, safety, and observability in production.
**Expectations**
- Minimum 5 years of software/ML engineering experience, with at least 2 years in production GenAI systems.
- Proven track record of shipping production systems, not prototypes.
- Deep understanding of AI system reliability, observability, cost, and scale.
**Key Responsibilities**
- Architect, develop, and maintain end‑to‑end AI/LLM solutions (chatbots, analytics assistants, automation tools).
- Create internal productivity tools for cross‑functional teams.
- Build autonomous AI agents and workflow orchestrators using LangChain, CrewAI, ADK, or similar.
- Design LLM‑backed microservices (FastAPI/Flask) for summarisation, forecasting, data extraction, and reasoning.
- Implement full RAG pipelines: ingestion → chunking → embeddings → indexing → retrieval → LLM reasoning.
- Optimize retrieval quality with metadata, hybrid search, chunking strategies, rerankers, and relevance tuning.
- Add document classification, NER, entity extraction, and knowledge‑graph‑driven retrieval.
- Establish reliability, safety, and governance guardrails; monitor, log, and benchmark AI and RAG systems.
- Deploy and operate agents/microservices on AWS Bedrock, Lambda, ECS/EKS, API Gateway, S3, Secrets Manager, CloudWatch.
- Build and maintain CI/CD pipelines (GitHub Actions), model/version lifecycle management, retraining, and automated evaluation.
**Required Skills**
- Python (modular design, async, modern practices).
- Production APIs and microservices (FastAPI/Flask).
- LLM engineering: agentic workflows, RAG pipelines, evaluation (hallucination testing, regression benchmarks).
- Retrieval systems (vector databases, indexing).
- Observability tools (OpenTelemetry, LangSmith, Weights & Biases, Arize/Phoenix).
- AWS services (Bedrock, EC2, Lambda, ECS/EKS, API Gateway, S3, CloudWatch).
- Docker, Kubernetes, CI/CD practices (preferred).
- Knowledge of search/retrieval platforms (Kendra, OpenSearch, Weaviate, Qdrant, Pinecone).
- UI building experience (React, Streamlit) is a plus.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
- Certifications in AWS (e.g., AWS Certified Solutions Architect, Developer) or ML/AI are advantageous.