Job Specifications
Avanciers is seeking a highly skilled Site Reliability Engineer for an exciting opportunity with one of our Fortune 500 clients.
Job Title: Site Reliability Engineer (SRE)
Location: Santa Clara, CA
Position Type: Full-time
Position Summary
We are seeking experienced Site Reliability Engineers (SREs) with strong expertise in native server environments and hands-on experience in Semiconductor or Electronic Software companies.
The ideal candidate will have a solid background in bare-metal data center management, automation, and observability, ensuring high reliability and performance across production systems.
Key Responsibilities
1. Service Reliability & Incident Management
Guard and maintain Service Level Agreements (SLAs) for critical engineering services.
Implement and manage monitoring, alerting, and incident response mechanisms.
Conduct root cause analysis and post-mortems for SLA breaches and critical incidents.
2. Observability & Monitoring
Set up and maintain monitoring tools such as Prometheus, Grafana, and ELK Stack to track system health and KPIs.
Develop and maintain KPI pipelines using Jenkins, Python, and ELK.
Create custom alerts to enhance system observability and proactive incident prevention.
3. Automation & Optimization
Develop automation scripts and workflows using Python, Go, Bash, and Jenkins.
Support capacity planning, infrastructure optimization, and performance tuning.
Improve system efficiency and reliability through automation and operational best practices.
4. Day-to-Day Operations
Monitor system alerts, investigate issues, and ensure timely resolutions.
Participate in WAR room sessions during major incidents or outages.
5. Collaboration & Documentation
Collaborate closely with software, hardware, and infrastructure teams.
Maintain detailed documentation for procedures, configurations, and troubleshooting steps.
Required Technical Skills
Baremetal data center machine management tools: IPMI, Redfish, KVM
Automation & Scripting: Jenkins, Python, Go, Bash
Infrastructure & Monitoring: Kubernetes, MySQL, Prometheus, Grafana, ELK
Preferred Hardware Exposure: GPUs, Tegra systems
Preferred Profile
5–10 years of hands-on experience as an SRE or Infrastructure Engineer.
Strong background in native server management and data center infrastructure.
Experience in Semiconductor or Electronic Software companies is highly preferred.
Proven ability to maintain reliable, scalable, and automated environments
About the Company
At Avanciers, we drive business transformation by delivering exceptional talent solutions and cutting-edge technology services. Since 2015, we've been a trusted partner to enterprises across North America and beyond, offering impactful services in Staffing, Salesforce Consulting, Google Cloud Solutions, UI/UX Design, and Web Development.
As a woman-owned, diversity-driven organization and a certified Salesforce and Google Cloud Partner, our mission is to empower companies to scale, innovate, and deliver results faster. We d...
Know more