- Company Name
- Triunity Software, Inc.
- Job Title
- Data Science Specialist
- Job Description
-
Job Title: Data Science Specialist
Role Summary:
Lead the transformation of a legacy keyword‑based ElasticSearch system into a modern, semantic search platform. Design and implement hybrid BM25‑plus‑vector retrieval, LLM‑driven ranking and RAG pipelines, build scalable APIs, and deploy end‑to‑end solutions on AWS. Continuously evaluate search quality using industry metrics and iterate with data‑driven insights.
Expactations:
- 5–10 years of professional experience in AI/ML, NLP, or IR system development.
- Proven expertise in ElasticSearch/OpenSearch at the index design, query tuning, and vector field level.
- Deep understanding of semantic embeddings (SBERT, Llama, GPT‑based, Cohere, etc.) and approximate nearest neighbor search.
- Hands‑on AWS deployment knowledge (OpenSearch Service, SageMaker, Lambda, ECS/EKS, API Gateway, S3, IAM).
- Strong Python skills; familiarity with Java/Scala and REST API frameworks (FastAPI, Flask).
- Experience in MLOps/MLOps, Docker, CI/CD pipelines, and search evaluation (nDCG, MRR, precision@k).
Key Responsibilities:
- Analyze and remediate limitations of existing regex/keyword search in ElasticSearch.
- Tune BM25, develop synonym/analyzer rules, and apply boosting/scoring strategies.
- Integrate dense embeddings into ElasticSearch, enable hybrid BM25‑vector retrieval, and implement re‑ranking with cross‑encoders or LLM evaluators.
- Build RAG flows utilizing ElasticSearch vectors or AWS native tools.
- Design and implement scalable search APIs with low latency, high throughput, and real‑time indexing of structured/unstructured text.
- Deploy and maintain infrastructure on AWS, ensuring scalability, fault tolerance, and monitoring dashboards.
- Develop search quality metrics, conduct A/B experiments, and fine‑tune ranking functions.
- Collaborate with product teams to refine search behaviors based on user analytics.
Required Skills:
- ElasticSearch/OpenSearch (analyzers, mappings, BM25, vectors, aggregations).
- Semantic search: embeddings, vector databases, ANN techniques, LLM retrieval pipelines, RAG architectures.
- Programming: Python (required); Java/Scala (plus).
- API development: FastAPI, Flask; containerization with Docker.
- AWS services: OpenSearch, SageMaker, Lambda, EKS/ECS, API Gateway, SQS/SNS, S3, IAM.
- CI/CD, Git, Jenkins, or similar.
- Search evaluation frameworks, IR metrics, and A/B testing methodology.
- Optional: cross‑encoder/bi‑encoder pipelines, query understanding, spell/autocorrect, LLMOps, multi‑modal search, knowledge graph integration.
Required Education & Certifications:
- Bachelor’s degree or higher in Computer Science, Electrical Engineering, Applied Mathematics, or a related field.
- Relevant certifications (e.g., AWS Certified Solutions Architect, AWS Certified Machine Learning, or equivalent) are desirable but not mandatory.