- Company Name
- Morgan Stanley
- Job Title
- AI Platform Engineer
- Job Description
-
Job title: AI Platform Engineer
Role Summary: Build and scale an enterprise‑wide AI Development Platform that delivers secure, compliant, and reusable AI solutions across the organization, enabling rapid adoption of generative AI and advanced data services.
Expectations: • Energetic, cross‑disciplinary contributor with strong ownership.
• Proven ability to design and maintain cloud‑native tooling, APIs, and deployment pipelines.
• Excellent communicator who collaborates across regions and functions, and advocates AI to drive productivity and innovation.
Key Responsibilities:
- Design and implement self‑service tooling for AI solution deployment using Kubernetes/OpenShift, Python, APIs, and authentication layers.
- Develop Terraform modules and cloud architecture for secure, scalable AI service provisioning on AWS, Azure, or GCP.
- Create reusable, containerised workloads for generative AI (pre‑trained and fine‑tuned models), integrating vector store and embedding pipelines.
- Author best‑practice guides, architecture decision records, and documentation for GenAI ecosystems (GPT, Llama, Hugging Face, LangChain).
- Ensure platform reliability and observability: blue/green release strategies, logging, monitoring, metrics, and automated system‑management tasks.
- Contribute to Agile ceremonies, code reviews, and on‑call rotations.
Required Skills:
- Proficient in application development (Python with Flask or FastAPI), including asynchronous, multiprocessing, multithreading, and performance profiling.
- Strong data‑engineering foundation: SQL, NoSQL, BigData, Kafka, Redis; data governance and privacy.
- Kubernetes workload development and management, preferably on OpenShift.
- Design, develop, and maintain scalable RESTful services.
- Experience deploying cloud applications using IaC (Terraform) on at least one major cloud platform (AWS, Azure, or GCP).
- DevOps acumen: CI/CD (Jenkins, GitOps), unit and integration testing, BDD.
- Knowledge of OAuth 2.0, OpenTelemetry stack (Grafana, Loki, Prometheus, Cortex).
- Understanding of microservices architecture, modern configuration management, and deployment patterns such as blue/green.
- Familiarity with state sharing mechanisms (Kafka, distributed caching) and multiple database engines (SQL, Redis, Kafka).
- Hands‑on experience building generative AI or LLM‑based applications, with deep knowledge of AI agents, orchestration, and workflow automation.
- Excellent written and verbal communication, with demonstrated ability to collaborate globally.
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Software Engineering, or a related technical discipline.
- Cloud certification (e.g., AWS Certified Solutions Architect, Azure Associate, GCP Professional Cloud Architect) is a plus.