- Company Name
- Apex Group Ltd
- Job Title
- Head of AI Infrastructure & Machine Learning Operations
- Job Description
-
**Job Title:** Head of AI Infrastructure & Machine Learning Operations
**Role Summary:**
Lead the design, implementation, and scaling of AI/ML infrastructure and MLOps within a regulated financial services environment. Drive the development of secure, compliant, and scalable platforms that enable rapid prototyping, deployment, and operational excellence of AI models and agents across the organization.
**Expectations:**
- Deliver a production-ready AI runtime supporting rapid experimentation and large language model (LLM) agent deployment.
- Build and maintain a robust MLOps pipeline (CI/CD, versioning, monitoring, incident response).
- Ensure all infrastructure meets security, privacy, and regulatory standards (e.g., GDPR, EU AI Act).
- Collaborate with cross-functional teams to align infrastructure with evolving business needs.
- Foster a culture of responsible AI, platform resilience, and continuous improvement.
**Key Responsibilities:**
1. Establish and scale AI runtime environments for prototyping and LLM agent deployment.
2. Design and implement a comprehensive MLOps stack, including model versioning, CI/CD pipelines, automated monitoring, and drift/fairness detection.
3. Build secure, compliant AI development and deployment architectures, integrating with existing governance frameworks.
4. Partner with data scientists, security, compliance, and business units to define and deliver infrastructure requirements.
5. Oversee incident response, monitoring, and recovery protocols for AI/ML services.
6. Lead continuous improvement initiatives, adopting emerging tools and practices to enhance platform efficiency and reliability.
**Required Skills:**
- Proven leadership in designing cloud‑native or hybrid AI/ML platforms at scale.
- Deep expertise in MLOps strategy: end‑to‑end pipelines, CI/CD, model versioning, retraining, and deployment across batch, real‑time, REST, and edge environments.
- Hands‑on experience with tools such as MLflow, SageMaker, Databricks, or equivalent.
- Strong monitoring, observability, and incident response skills (drift detection, fairness tracking, latency alerts).
- Knowledge of security, privacy, and regulatory compliance (GDPR, EU AI Act) and experience building compliant AI infrastructure.
- Strategic thinking with a focus on operational excellence and responsible AI.
- Excellent communication, collaboration, and stakeholder management abilities.
**Required Education & Certifications:**
- Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or related field.
- Relevant certifications in cloud platforms (AWS, Azure, GCP) or MLOps are preferred.
---