cover image
Cloudheight Solutions

Cloudheight Solutions

www.cloudheight.pt

1 Job

1 Employees

About the Company

At CloudHeight Solutions, we are your all-in-one partner for building and elevating your online presence. Based in Lisbon, Portugal, we specialize in delivering end-to-end digital solutions designed to help businesses succeed in today’s competitive landscape. With a strong focus on empowering small and medium enterprises (SMEs) to establish a robust digital presence, we provide custom website design, SEO & marketing, professional email services, and more—offering the tools and expertise to launch, enhance, and optimize your brand’s digital journey.

We pride ourselves on delivering comprehensive, client-focused services tailored to your unique digital needs from start to finish. Looking ahead, we’re excited to expand our offerings with multilingual 24/7 customer support across multiple European languages, ensuring a seamless experience for a diverse clientele. Additionally, our next release will introduce innovative AI-powered tools to help you visualize and optimize your social connections for better opportunities.

Have questions or ready to get started?

Contact us at info@cloudheight.pt or explore our services today. Let CloudHeight Solutions empower your SME with a standout online presence.

Listed Jobs

Company background Company brand
Company Name
Cloudheight Solutions
Job Title
ML Engineer
Job Description
**Job title** ML Engineer (Performance Optimization) **Role Summary** Lead end‑to‑end performance engineering for AI/ML foundation models, driving latency, cost, and throughput improvements across training, inference, and deployment pipelines. Design GPU‑accelerated components, implement custom CUDA kernels, and build internal tooling for continuous performance validation. **Expectations** - 5+ years of hands‑on experience in ML systems, performance engineering, or advanced software engineering roles (or 3+ years with an MS/PhD in CS, EE, or related field). - Proven expertise in profiling, debugging, and optimizing deep‑learning workloads for latency, throughput, and cost at scale. - Senior knowledge of distributed training/inference strategies (data/model/sharding, pipeline parallelism). **Key Responsibilities** - Own performance, scalability, and reliability of foundation models during training, inference, and deployment. - Profile and optimize the entire ML stack: data pipelines, training loops, inference serving, and deployment workflows. - Design, implement, and integrate GPU‑accelerated components; develop custom CUDA kernels as needed. - Reduce latency and cost per inference token while maximizing throughput and GPU utilization. - Translate product requirements into measurable performance goals (p50/p95/p99 latency, throughput, GPU‑utilization, memory footprint, cost/ token) and technical roadmaps. - Build and maintain internal benchmarking, evaluation harnesses, and automation for continuous performance validation. - Contribute to model architecture and system‑design decisions that impact performance, robustness, and operational efficiency. - Advocate best practices for performance‑aware development, monitoring, and continuous improvement across the engineering team. **Required Skills** - Deep learning frameworks: PyTorch (core) and familiarity with TensorFlow. - Model export/runtime formats: TorchScript, ONNX, SavedModel. - CUDA programming: kernel development, GPU memory management, asynchronous execution. - Performance optimization techniques: mixed precision (FP16/BF16/AMP), quantization (PTQ/QAT, int8, q4/q8), pruning, distillation, activation checkpointing, operator fusion, batch/caching strategies. - Experience with large transformer models, attention kernel optimization, and memory/compute trade‑offs. - Distributed training/inference: data/model/sharding, pipeline parallelism, tensor parallelism, ZeRO, sharding, Horovod. - Cloud deployment: AWS, Azure, or GCP; containerized deployments with Docker, Kubernetes. - Experiment tracking, monitoring, and evaluation pipelines. - Optional: custom CUDA kernel development, integration with low‑level GPU libraries (cuBLAS/cuDNN, NCCL), inference serving (Triton, TensorRT, FasterTransformer, DeepSpeed). **Required Education & Certifications** - Bachelor’s degree in Computer Science, Electrical Engineering, or related technical field (preferred MS/PhD for candidates with fewer years of experience).
London, United kingdom
On site
Mid level
29-01-2026