Lemurian Labs

Company Name: Lemurian Labs
Job Title: Senior ML Performance Engineer
Job Description: **Job title:** Senior ML Performance Engineer **Role Summary:** Lead the design, development, and operation of a full‑stack performance testing platform for large language model (LLM) inference workloads on GPU clusters. Serve as the technical authority on measuring, validating, and optimizing LLM performance before and after compiler transformations. **Expactations:** - Own the end‑to‑end performance testing lifecycle, from benchmark design to reporting. - Provide actionable insights that directly influence compiler optimizations and product quality. - Drive cross‑functional collaboration between ML, compiler, and DevOps teams. - Champion performance best practices and maintain a culture of data‑driven improvement. **Key Responsibilities:** - Design and implement comprehensive LLM benchmarking methodology covering latency, throughput, memory usage, power consumption, and accuracy. - Build and maintain automated testing pipelines integrated with CI/CD for continuous performance validation across compiler releases and model updates. - Establish baseline performance for unoptimized models (e.g., Llama 3.2 70B, DeepSeek) and validate post‑optimization gains. - Profile and analyze GPU workloads using ROCm, CUDA profilers, and system‑level monitoring to identify bottlenecks. - Develop dashboards and reports to track performance trends, regressions, and wins. - Document and disseminate best practices for performance testing, optimization, and GPU resource utilization. - Collaborate with compiler, ML, and DevOps engineers to embed performance testing into the development workflow. **Required Skills:** - 7+ years in performance engineering, benchmarking, or systems engineering. - Deep knowledge of transformer‑based LLM inference workloads. - Hands‑on GPU programming and optimization (CUDA, ROCm, or equivalent). - Strong programming proficiency in Python and C/C++. - Proven experience building performance testing or benchmarking platforms from scratch. - Familiarity with PyTorch, TensorFlow, ONNX Runtime, vLLM, TensorRT‑LLM, and related frameworks. - Proficient with GPU profiling/debugging tools and performance analysis. - Experience with CI/CD and test automation frameworks. - Strong analytical and experimental design skills. **Nice to Have:** - Experience with AMD GPUs (Mi200/Mi300) and ROCm ecosystem. - Knowledge of compiler optimizations and their performance impact. - Experience with distributed inference, multi‑GPU workloads, and model compression techniques. - Familiarity with infrastructure‑as‑code (Kubernetes, Docker, Terraform). - Open‑source contributions in ML or systems domains. **Required Education & Certifications:** - Bachelor’s or Master’s degree in Computer Science, Engineering, or related field. - Relevant certifications in GPU programming or performance engineering preferred but not mandatory.

Toronto, Canada

Hybrid

Senior

12-11-2025

Company Name: Lemurian Labs
Job Title: Product Manager
Job Description: Job title: Product Manager Role Summary: Senior technical product manager responsible for defining, designing, and launching AI infrastructure products that enable seamless, scalable model training and deployment across heterogeneous hardware. Works closely with engineering, marketing, sales, and enterprise customers to translate cutting‑edge compiler and runtime technologies into user‑centric features and go‑to‑market strategies. Expectations: - Drive product vision and strategy for AI infrastructure solutions. - Own the product lifecycle from concept to launch, ensuring alignment with market needs and business objectives. - Operate in a high‑growth, fast‑paced startup environment, delivering high‑impact products from zero to market. Key Responsibilities: - Identify and analyze AI workloads across industries, assessing infrastructure requirements for large‑scale model builders and enterprise AI teams. - Define optimization strategies for AI workloads using a hardware‑agnostic software stack, ensuring seamless integration with existing tools. - Collaborate with engineering teams to build products that accelerate AI development and deployment, translating customer insights into feature specifications. - Engage directly with enterprise AI teams, model builders, and developers to gather requirements and validate product hypotheses. - Work cross‑functionally with marketing, sales, and support to drive product adoption, define go‑to‑market plans, and track success metrics. - Define and monitor key performance indicators to measure product impact and inform continuous improvement. - Stay current on AI infrastructure advances, frameworks, and cloud-native technologies to inform product roadmap. Required Skills: - 6+ years product management experience in AI, cloud platforms, or developer tools. - Deep knowledge of AI model development, inference, training platforms, and large‑scale deployment requirements. - Strong technical foundation in compilers, AI frameworks (TensorFlow, JAX, PyTorch, Triton, XLA), and cloud‑native technologies (Kubernetes, Docker, Terraform). - Ability to translate customer needs into actionable product features and drive engineering execution. - Excellent communication, able to articulate complex technical concepts to both technical and non‑technical stakeholders. - Proven track record of shipping successful products in fast‑paced environments. Required Education & Certifications: - Bachelor’s degree in Computer Science, Engineering, or related technical field (Master’s preferred). - Certifications in cloud platforms (AWS, GCP, Azure) or container orchestration (CKA, CKS) are a plus.

Toronto, Canada

Hybrid

Mid level

15-12-2025

Company Name: Lemurian Labs
Job Title: Runtime Engineer
Job Description: Job title: Runtime Engineer Role Summary: Design, develop, and optimize a high‑performance, multi‑target runtime for AI workloads, facilitating “build once, deploy anywhere” across cloud, edge, and custom hardware. Expectations: Deliver efficient, scalable kernels; benchmark and analyze performance on target platforms; collaborate closely with product and ML engineering to evolve architecture. Key Responsibilities: - Architect, implement, and maintain the multi‑target runtime. - Apply parallelization, partitioning, and kernel optimization techniques to generate performant code paths. - Prototype new ideas and conduct data‑driven evaluation of compiler outputs. - Benchmark runtime on diverse hardware, identify bottlenecks, and build tooling for performance analysis. - Work with product teams to gather requirements and drive architectural enhancements. Required Skills: - 4+ years experience in C/C++ (C++14+), asynchronous and concurrent programming. - Deep understanding of hardware architecture: vector/scalar registers, memory hierarchy. - Knowledge of OS kernel or hypervisor development. - Proficiency in parallelization, partitioning, and performance tuning. - Strong analytical and prototyping abilities. Required Education & Certifications: - Master’s or Ph.D. in Computer Science or equivalent practical experience.

Toronto, Canada

Hybrid

Junior

19-01-2026

Company Name: Lemurian Labs
Job Title: Compiler Code Gen Engineer
Job Description: **Job Title** Compiler Code Generation Engineer **Role Summary** Design, implement, and optimize a high‑performance, portable compiler for heterogeneous AI workloads. Focus on cross‑platform code generation, performance tuning, and architectural enhancements to support modern machine learning models on diverse hardware. **Expectations** - Deliver efficient, scalable code generation for cloud and edge deployments. - Collaborate with product teams to align compiler features with ML engineer needs. - Continuously evaluate and improve compiler performance using profiling data. **Key Responsibilities** - Design and maintain heterogeneous AI compiler architecture. - Implement new capabilities and extend existing architecture to support emerging ML models and hardware. - Apply advanced parallelization, partitioning, and kernel optimization techniques. - Generate, analyze, and act on performance metrics to refine code generation. - Produce clear documentation and communicate design decisions within the team. **Required Skills** - 4+ years experience in compiler development. - Strong understanding of compiler algorithms, data structures, and low‑level code generation. - Proficiency in C/C++ and object file manipulation. - Excellent written and oral communication; ability to produce concise technical documentation. - Detail‑oriented, collaborative, and proactive in improving system performance. **Preferred Skills** - Master’s or PhD in Computer Science, Electrical Engineering, or related field. - Experience with LLVM and traditional compiler passes (instruction selection, register allocation, dominance analysis). - Knowledge of calling conventions, linking, relocations, and API interactions. - Familiarity with loop optimizations (vectorization, unrolling, fusion, parallelization). - Exposure to machine‑learning workloads and hardware optimization. **Required Education & Certifications** - Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience.

Toronto, Canada

Hybrid

Junior

19-01-2026

About the Company

Listed Jobs