- Company Name
- Quantum World Technologies Inc.
- Job Title
- Senior Machine Learning Ops Engineer
- Job Description
-
**Job Title:** Senior Machine Learning Ops Engineer
**Role Summary:**
Architect, build, and maintain a scalable, reusable ML platform on Google Vertex AI. Design CI/CD pipelines, governance, and observability tooling to enable rapid, reliable model deployment and service by cross‑functional domain teams.
**Expectations:**
* Deliver reliable, automated end‑to‑end ML workflows that meet performance, latency, and SLA requirements.
* Translate data‑science prototypes into production‑ready services with minimal hand‑off.
* Serve as a technical mentor and liaison, facilitating effective collaboration among data scientists, engineers, and leadership.
**Key Responsibilities:**
* Design and implement reusable modules and templates for model training, deployment, monitoring, and rollback across Vertex AI and other cloud platforms.
* Build and maintain scalable CI/CD pipelines for rapid iteration, testing, and promotion of ML models.
* Develop automation and tooling for platform observability: model drift detection, performance tracking, latency monitoring, and alerting.
* Partner with domain teams to onboard models, ensuring alignment with operational standards, SLAs, and governance policies.
* Maintain and evolve feature store, model registry, and endpoint management systems to support high‑throughput, low‑latency inference.
* Define and enforce governance policies, including versioning, rollback strategies, access controls, and compliance.
* Provide technical guidance and support to domain teams, reducing operational bottlenecks and enabling self‑service capabilities.
* Act as a liaison between data engineering, data science, ML engineering, MLOps, and executive leadership to align goals and prioritize initiatives.
**Required Skills:**
* 4+ years of MLOps experience with cloud‑native ML platforms.
* Strong proficiency in Python; experience with TensorFlow, PyTorch, scikit‑learn, and custom Docker containers.
* Deep knowledge of Google Cloud Platform – Vertex AI, Cloud Storage, BigQuery, Pub/Sub, Cloud Functions.
* Hands‑on experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI) and configuration management (Terraform, Pulumi).
* Ability to design and implement observability solutions: metrics, logs, alerts (Prometheus, Grafana, Cloud Monitoring).
* Familiarity with feature stores, model registries, and endpoint orchestration (Kubeflow, KFServing, Vertex Endpoints).
* Strong communication skills – able to explain technical concepts to non‑technical stakeholders.
* Ability to lead initiatives, prioritize tasks, and influence cross‑team collaboration.
**Required Education & Certifications:**
* Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
* Proven experience in MLOps; certifications (e.g., GCP Professional ML Engineer, TensorFlow Developer) are a plus but not mandatory.