- Company Name
- AXA en France
- Job Title
- Data Scientist / ML Engineer (F/H) stage
- Job Description
-
**Job Title**
Data Scientist / Machine Learning Engineer (Internship)
**Role Summary**
Join the AI / GenAI for Opex team to design, develop, and deploy large‑scale document‑understanding solutions that automate and accelerate business processes across insurance, claims, finance, legal, and operations. The intern will work end‑to‑end on AI applications, from data collection to production deployment, adapting the focus toward data science or ML engineering as needed.
**Expectations**
- Currently enrolled in a Master’s program (final year) or engineering degree in Data Science, Statistics, Applied Mathematics, Computer Science, or AI.
- Available for a 6‑month end‑of‑studies internship starting Feb‑Apr 2026.
- Demonstrates autonomy, strong curiosity, and rigorous work ethic.
- Effective team player with clear written and oral communication skills.
**Key Responsibilities**
- Acquire and preprocess scanned documents (PDF, images); perform OCR and structure extraction.
- Build high‑quality annotated corpora using LayoutLMv3, TrOCR, Tesseract OCR 5, spaCy, and VLMs.
- Fine‑tune LLMs/SLMs (e.g., BERT, GPT, LLaMA, T5) with LoRA/QLoRA; evaluate using BLEU, ROUGE, F1, OCR accuracy.
- Design ETL and feature‑engineering pipelines; expose models via REST APIs (FastAPI).
- Train models on Azure ML / OpenShift AI; conduct hyperparameter tuning with Optuna or Ray Tune.
- Containerize models and deploy on Kubernetes using Helm/Kustomize; implement CI/CD pipelines (Azure DevOps).
- Monitor production performance, detect concept drift, and automate retraining (OpenTelemetry, Dynatrace).
- Participate in code reviews, pair programming, and produce technical documentation (Markdown, MkDocs, Confluence).
**Required Skills**
*Technical*
- Python programming (pandas, numpy, PyTorch).
- Machine learning and NLP fundamentals: classification, regression, clustering, transformers.
- OCR & Document AI tools: spaCy, Tesseract, LayoutLM, TrOCR.
- MLOps & cloud: Git, Docker, Kubernetes, Helm, Kustomize, Azure ML (preferred).
- Experience with ML pipelines (Kedro) and API development (FastAPI).
*Soft*
- Self‑motivation and intellectual curiosity.
- Strong teamwork and collaboration mindset.
- Ability to simplify complex technical concepts for diverse audiences.
**Required Education & Certifications**
- Master’s level (or equivalent) in Data Science, Statistics, Applied Mathematics, Computer Science, or Artificial Intelligence.
- No specific certifications required; proven academic projects or internships in ML/NLP are advantageous.