- Company Name
- Stefanini North America and APAC
- Job Title
- Infrastructure Engineer (OpenShift Virtualization)
- Job Description
-
Job Title: Infrastructure Engineer (OpenShift Virtualization)
Role Summary: Design, implement, and maintain the OpenShift Virtualization platform, driving capacity planning, automation, SRE practices, and comprehensive observability to support developer productivity and platform stability.
Expactations: 6+ years of experience in infrastructure engineering; strong background in OpenShift/Kubernetes, automation, cloud architecture, and root‑cause analysis. Demonstrated ability to work across global environments and collaborate with application teams.
Key Responsibilities:
* Conduct capacity planning and forecasting for compute, memory, storage, and network resources; develop capacity models and reports for strategic scaling.
* Analyze resource utilization trends and recommend scaling, consolidation, or optimization strategies.
* Develop and maintain automation solutions (scripts, playbooks, CI/CD pipelines) for OSV tasks including configuration changes, VM management, auditing, and ticketing integration.
* Apply Site Reliability Engineering principles to improve platform stability, performance, and operational efficiency (RBAC, namespace/quotas).
* Implement end‑to‑end observability (monitoring, logging, tracing) with Dynatrace, Prometheus/Grafana, and explore event‑driven architecture for real‑time insights.
* Perform deep‑dive root‑cause analysis to rapidly identify and resolve platform incidents across global compute environments.
* Monitor VM health, resource usage, and performance; detect anomalous activity that may indicate security or configuration issues.
* Provide solution design and knowledge‑management consulting to application teams.
Required Skills:
* OpenShift Virtualization, Kubernetes, Docker, and cluster fundamentals
* Scripting and automation: Python, PowerShell, Bash, Ansible, GitHub Actions/Tekton
* Cloud platforms: Google Cloud Platform, experience with VMware/VMware ESXi
* Monitoring & observability: Dynatrace, Prometheus, Grafana, logging & tracing tools
* Site Reliability Engineering practices (alerting, incident response, SLIs/SLOs)
* Root Cause Analysis, performance tuning, capacity planning, and resource optimization
* Security fundamentals: RBAC, access control, monitoring for suspicious activity
Required Education & Certifications:
* Associate or Bachelor’s degree in Computer Science, IT, or related field (Bachelor’s preferred)
* Relevant certifications (CKA/CKS, Red Hat Certified Engineer, GCP Associate Cloud Engineer, Ansible Automation Platform) are a plus.