- Company Name
- CHU d'Angers
- Job Title
- Ingénieur en traitement automatique du langage (TAL) 100% ETP - Centre de Données Cliniques
- Job Description
-
**Job Title**
NLP Engineer (Natural Language Processing)
**Role Summary**
Develop and deploy NLP methods to transform unstructured clinical text (e.g., case reports) and structured data into actionable knowledge. Lead end‑to‑end projects—from data preprocessing and model training to production deployment and stakeholder communication—within a health data warehouse environment.
**Expectations**
- Translate clinical and research requirements into scalable NLP solutions.
- Deliver robust, reproducible models and APIs for internal use.
- Collaborate with data engineers, clinicians, and researchers to ensure model relevance and usability.
- Maintain high standards of documentation, code quality, and ethical data handling.
**Key Responsibilities**
- Engineer pipelines for text extraction, concept qualifying, and semantic enrichment using knowledge graphs, ontologies, and embeddings.
- Perform NER, relation extraction, text classification, and document clustering with machine‑learning, deep‑learning, and large‑language‑model techniques.
- Prepare, clean, and annotate corpora for supervised training.
- Evaluate and benchmark models (e.g., Hugging Face, spaCy) against defined metrics.
- Create visualisations (Plotly, Matplotlib) and clear reports for clinical and research teams.
- Prototype and expose models via REST APIs, Streamlit dashboards, or MLflow.
- Document models, data pipelines, and experiments for reproducibility.
**Required Skills**
- **Programming & Libraries**: Python, Pandas, Scikit‑learn, PyTorch, Transformers, spaCy, NLTK.
- **Data Engineering**: SQL, handling of large text corpora.
- **NLP Tools**: Hugging Face, spaCy, fastText, Word2Vec, BERT, other embeddings.
- **MLOps**: MLflow, basic API development (REST), optional Streamlit.
- **Data Visualization**: Plotly, Matplotlib, or equivalent.
- **Domain Knowledge (desired)**: Familiarity with medical ontologies/terminologies (SNOMED, UMLS).
- **Soft Skills**: Scientific rigor, problem‑solving, teamwork, and effective communication.
**Required Education & Certifications**
- Master’s (Bac+5) or PhD in Data Science, Computer Science, Natural Language Processing, or related field.
- Proven experience (or solid internship) in applied NLP on business or clinical data.
- Demonstrated ability to move projects from concept to production.
---