- Company Name
- Baseten
- Job Title
- Software Engineer - Model API's
- Job Description
-
Job title: Software Engineer – Model APIs
Role Summary: Architect, build, and maintain high‑performance model serving APIs that expose large‑language‑model endpoints. Focus on inference performance, observability, and developer experience while ensuring low latency, reliability, and scalability across distributed GPU resources.
Expectations:
- 3+ years designing and operating large‑scale, low‑latency distributed systems or APIs.
- Proven ownership of backend services with rate‑limiting, authentication, quotas, and metering.
- Strong infra instincts: profiling, tracing, capacity planning, and SLO management.
- Comfortable debugging complex runtimes, GPU execution traces, and custom CUDA operators.
- Excellent written communication; able to produce design docs and collaborate cross‑functionally.
Key Responsibilities:
- Design, develop, and operate the Model API surface for structured outputs, function calling, and multi‑modal serving.
- Profile and optimize TensorRT‑LLM kernels, CUDA performance, and multi‑GPU communication patterns.
- Implement performance improvements (speculative decoding, quantization, batching, KV‑cache reuse) across runtimes.
- Build comprehensive benchmarking frameworks for real‑world workloads (model types, batch sizes, sequence lengths, hardware).
- Instrument observability with metrics, traces, and logs; create repeatable benchmarks for speed, reliability, and quality.
- Implement platform fundamentals: API versioning, validation, usage metering, quotas, and authentication.
- Collaborate with product, infra, and dev‑experience teams to deliver robust, developer‑friendly serving experiences.
Required Skills:
- Distributed systems, large‑scale API design, and low‑latency backend engineering.
- Experience with rate‑limiting, auth, quotas, and metering.
- Profound knowledge of profiling, tracing, and performance tuning, including GPU and CUDA.
- Familiarity with TensorRT‑LLM, vLLM, or similar inference engines.
- Ability to debug runtime internals, GPU trace logs, and custom CUDA operators.
- Strong documentation skills and cross‑team collaboration.
- Optional: Kubernetes, service meshes, API gateways, and open‑source API experience strengthens candidacy.
Required Education & Certifications:
- Bachelor’s degree in Computer Science, Electrical Engineering, or a related technical field (or equivalent experience).