Job Specifications
Sustainable Talent is partnering with Nvidia a global leader who's been transforming computer graphics, PC gaming, and accelerated computing for over 25 years.
We are looking for a Systems Engineering Technician, to support our client's on-premise, private cloud infrastructure Team. This is a W-2 full-time, contract role based in Hillsboro, OR. We offer competitive pay $85-100/hr based on factors like experience, education, location, etc. and provide full benefits, PTO, and amazing company culture!
Do you thrive on cutting-edge technology and crave being challenged in a fast-paced R&D and hyperscale infrastructure environment? If so, this exciting opportunity with NVIDIA won’t disappoint. In this role, you will manage and maintain our state-of-the-art compute farm – composed of builders, packagers, testers and verification rigs – serving a global developer base working on next-generation GPU, AI/ML, accelerated computing hardware and software. The environment is vast, the scale significant, and the expectations high. We need YOU to help us deliver world-class data-centers and labs from our Hillsboro region, enabling deterministic results for our engineering teams and demanding users worldwide.
What You’ll Do
Partner closely with system architects, hardware engineers, firmware/software teams, QA/test, and platform engineers to craft, develop, deploy, debug and release next-generation NVIDIA products.
Manage and maintain a high-availability compute cluster comprising builders/packagers/testers and core support infrastructure (racks, GPU nodes, network interconnects, storage arrays).
Monitor and ensure availability targets, lead system recovery, root-cause failures in compute, network, storage and thermals, and drive rapid remediation.
Deploy, qualify, benchmark and scale new systems and hardware bring-ups in our on-prem environment (including high density GPU clusters, rack scale systems, liquid cooling environments).
Coordinate inventory, asset lifecycle, configuration management, decommissioning and refresh tasks across labs, racks and data-hall floors.
Maintain a world-class, safe, clean, organized lab and datacenter environment (cable management, ESD compliance, tool control, mechanical tasks).
Troubleshoot issues across hardware, firmware, OS (Windows, Linux, Mac) and platform-infrastructure with cross-functional platform/ops teams.
Plan, deploy and maintain on-premises infrastructure (power distribution, cooling/thermal management, UPS/Battery systems, rack/pdu/power) in collaboration with data-center and network engineering teams.
Drive efficiency improvements for availability, throughput, accuracy of test systems, while meeting internal SLAs and key operational metrics (e.g., PUE, mean-time-to-repair, throughput of test cycles).
Represent the infrastructure team in internal review meetings, collaborate globally with NVIDIA teams to align on build-out strategy, capacity planning and datacenter operations.
What We Need to See
Associate’s or Bachelor’s degree in Engineering or a Technical Major, or equivalent hands-on experience in infrastructure, hardware, or compute lab environments.
Proven experience operating in datacenter environments or large-scale engineering/test labs, especially with compute-dense/hyperscale hardware.
Familiarity with version control systems (e.g., Git, Perforce) for firmware/software and infrastructure configuration.
Proficiency with infrastructure tools such as DCIM (e.g., Nautobot), scripting and automation (shell, Python, Ansible, etc.).
Solid working knowledge of fundamental network and services protocols (TCP/IP, DNS, NFS, SSL/TLS, IPv6) and experience working with high-bandwidth, low-latency interconnects.
Experience supporting multiple OS platforms (Windows, Mac, Linux), BIOS/firmware updates, driver deployments and system imaging.
Hands-on physical experience with PCBs, GPUs, server/node deployments, rack integration, cooling/power structures, cable/fibre management.
Excellent written and verbal communication skills; ability to translate technical concepts clearly to both technical and non-technical stakeholders.
Strong analytic and problem-solving skills; ability to take ownership, collaborate effectively in fast-moving teams and drive results.
What Makes You Stand Out
Experience deploying or managing HPC or GPU-accelerated clusters, with tools such as Slurm, BCM, Kubernetes, or other orchestration frameworks.
Exposure to cloud and on-premise convergence stacks (OpenStack, VMware, Nutanix, or other private cloud infrastructure).
Certifications such as CCNA/CCNP, or equivalent networking/infrastructure credentials.
Deep background in Windows & Linux administration, dense datacenter design (compute/storage/networking), and hyperscale scale-out systems.
Familiarity with hypervisor/VM applications, container orchestration, virtualized infrastructure, bare-metal provisioning.
Understanding of advanced data-centre infrastructure design: liquid cooling, immersion cool