Skills

Communication Python Go GitHub CI/CD Monitoring Jenkins Test Research Architecture Azure AWS Marketing Analytics GCP FastAPI Data Science Terraform Infrastructure as Code GitHub Actions

Job Specifications

The Opportunity

Our next frontier is a strategic shift: We're evolving beyond traditional analytics to build AI agents that actively participate in our operations. Rather than using data to inform decisions, we're creating intelligent systems that autonomously deliver better outcomes for customers and clients alike. You will build the systems that transform our data from a passive record into an active participant, learning from history to autonomously optimise the business.

You'll be our first dedicated AI engineer, working directly with the Head of Data. You'll collaborate weekly on architecture, especially in the first 6 months, to define the technical roadmap. You'll own the build, but you're not figuring this out alone.

We have a small data science team shipping traditional ML such as lead scoring. Your remit is production of GenAI systems. However, we intentionally overlap deployment patterns, monitoring standards, and evaluation approaches so we build one coherent AI capability.

What You Will Do

Architect & Engineer Agentic Systems
Build agents that act, not just answer: You will design agents that perform deterministic actions based on probabilistic reasoning. This means building systems that can reliably analyse data, execute function calls, and manage state across multi-step workflows without getting stuck in loops.
Production-Grade RAG: You will go beyond basic vector search. You will implement hybrid search (keyword + semantic), re-ranking strategies, and metadata filtering to ensure our agents have the exact context they need to make decisions.
Structured Data Extraction: You will build pipelines that turn unstructured conversations into structured data that our downstream systems can use.

Establish AI Engineering Foundations
Observability First: You will implement the "nervous system" of our AI. You will choose and set up tools (e.g., LangSmith, LangFuse, ADK, or custom) to trace execution chains, giving us visibility into why an agent made a specific decision.
Evals as a Service: You will build the testing harness. You will create automated evaluation pipelines that test prompts against "Golden Datasets" so we can deploy with confidence, ensuring a prompt change doesn't degrade performance.
Cost & Latency Engineering: You will monitor token usage and inference latency, optimising the trade-off between model intelligence and speed/cost for different parts of the chain.
Collaborate and Standardise
Partner on Architecture: You will work with the Head of Data to define the technical roadmap. You aren't just taking tickets; you are helping decide what we build based on technical feasibility and business value.
Unify with Data Science: You will define shared standards with our Data Science team on deployment patterns, monitoring, and security, ensuring we build one coherent AI platform, not silos.

What This Role Requires:

Must Have

Python and service development: You write clean, typed, production-ready code. You are comfortable with Pydantic (for data validation), Asyncio (for handling concurrent model calls), and FastAPI. You treat prompts as code: versioned, tested, and decoupled from business logic.
Cloud-native experience: You have hands-on experience deploying and operating containerised services on AWS (or GCP/Azure) using CI/CD platforms (Jenkins, GitHub Actions, CircleCI, BuildKite), cloud monitoring tools (Datadog, Sumologic, NewRelic), and container orchestrators (EKS, ECS). You're comfortable with Terraform for infrastructure as code.
Hands-on LLM experience: You've built something real with language models, whether production systems, serious side projects, or internal tools. You understand that prompting is engineering, not magic.

Nice to Have

Production GenAI at scale: Experience with structured outputs, managing context window constraints, and handling model latency/timeouts in user-facing applications. You know how to evaluate a change in prompt logic before deploying it.
Observability and evaluation pipelines: You've implemented tracing for LLM workflows or built automated evaluation against golden datasets.

Important Traits

Proactive Ownership & Communication: GenAI projects are prone to hype. You have the confidence to manage stakeholder expectations effectively, explaining trade-offs between cost, latency, and quality. When blocked, you don't just ask for help, you present options.
Translating "Fuzzy" to "Formal". Marketing problems are often vague ("Find better leads"). You can take a fuzzy business objective and break it down into a deterministic engineering problem: a set of tools, a prompting strategy, and a metric to measure success.
Pragmatism over Hype. You read the AI research papers, but you deploy what works. You'd rather use a simple few-shot prompt that is reliable and cheap than a complex autonomous agent framework that is flaky and expensive. You understand that "boring" code is easier to debug.

The Tech Reality

The Foundation (Fixed & Reliable) We

About the Company

MVF powers growth for our clients by connecting them to potential customers. The digital marketing landscape is complex and constantly evolving. Businesses need experts who are tracking that evolution and finding new ways to innovate and win. This is where MVF comes in. We match readers, buyers, & business leaders with the brands & companies that make the products and services they need. We do this by building relationships with potential customers at each stage of the marketing funnel by offering insights, information, and ... Know more