Job Specifications
Job Role : LLM - Full Stack Python + JS
Years of Experience : 6 to 7 years
Skill : Python, JavaScript / Node.js, TypeScript
Role Overview:
You’ll write and debug production-quality code, design rigorous evaluations, and build reproducible workflows that generate clean, high-signal data for model training. Attention to detail matters deeply here—small mistakes can cascade into misleading results, so precision and thoroughness are essential. You’ll also collaborate closely with engineers, researchers, and quality owners to align on standards, review work, and continuously raise the quality bar. If you enjoy solving unusual technical problems, investigating subtle model failures, and working in developer-like environments where correctness, reproducibility, and collaboration matter, this role will keep you very entertained.
What does your day-to-day look like:
Write, review, and debug code across multiple languages.
Design tasks and evaluation scenarios for coding, reasoning, and debugging
Investigate LLM outputs and identify hallucinations, regressions, and failure modes.
Build reproducible dev environments using Docker + automation tools.
Develop scripts, pipelines, and tools for data generation, scoring, and validation.
Produce structured annotations, judgments, and high-quality datasets.
Run systematic evaluations that help improve model reliability and reasoning.
Required Skills :
Experience using LLM coding tools (Cursor, Copilot, CodeWhisperer) Strong hands-on coding experience (professional or research-based) in one or more of:
Python, JavaScript / Node.js, TypeScript (Additional languages like Go, Java, C++, C#, Rust, SQL, R, Dart, etc. are a plus)Solid experience with Linux + Bash, scripting, and automation.
Strong with Docker, reproducible environments, and dev containers.
Advanced Git skills (branching, diffs, patches, conflict resolution) Solid understanding of testing and QA (unit, integration, negative, edge-case focused)
Ability to reliably overlap with 8am–12pm PT.
Nice-to-Haves:
Experience using LLM coding tools (Cursor, Copilot, CodeWhisperer) Experience with dataset creation, annotation, evaluation, or ML pipelines.
Familiarity with benchmarks like SWE Bench or Terminal Bench.
Background in QA automation, DevOps, ML systems, or data engineering.
Who Thrives Here:
Engineers who enjoy breaking things and understanding why.
People who like designing tasks, running experiments, and debugging.
Detail-oriented folks who can spot subtle issues in code or model behavior.
Engineers who like building clean, reusable workflows rather than one-off hacks.
About the Company
Sourcebae is an AI-driven recruitment engine designed to hire top global talent. With its end-to-end hiring model--sourcing, vetting, hiring, and managing--Sourcebae is the ultimate AI-powered, all-in-one hiring platform for businesses of all sizes.
Our product includes-
AI interviewer
Global talent pool
Management and compliances
Global capability centre
Why Choose Sourcebae?
1) Efficient AI interview process: Sourcebae revolutionizes the interview process with AI interviewer, ensuring a smooth and efficient experience f...
Know more