Skills

Communication Go TypeScript PowerShell Incident Response Encryption GitHub CI/CD DevOps Docker Kubernetes Monitoring Jenkins Ansible AWS Lambda Azure DevOps Azure Functions Problem-solving Decision-making Architecture Azure AWS Software Development CI/CD Pipelines Terraform Prometheus Grafana Infrastructure as Code GitHub Actions

Job Specifications

Role: Site Reliability Engineer (Only on W2)

Location: San Diego, CA - Onsite

Duration: 12 Months

Job Description:

The Site Reliability Engineer (SRE) will work closely with cross-functional teams, including software development, platform, and operations, to support the availability and performance of our cloud-based systems. You will take ownership of the cloud infrastructure, support automation and implement monitoring and alerting systems to proactively manage issues.

Key Responsibilities:

Cloud Infrastructure Management:

Design, deploy, and maintain scalable, secure, and highly available cloud infrastructure on AWS and Azure.

Proficient in infrastructure-as-code (Terraform, AWS CDK and CloudFormation) and scripting languages (TypeScript, PowerShell or Go-Lang).

Ensure cloud environments adhere to regulatory standards for healthcare data security and familiarity with (e.g., SOC II and ePHI compliance).

Observability and Monitoring:

Implement, configure, and optimize Datadog for application and infrastructure monitoring, ensuring full-stack visibility into system performance.

Set up alerting mechanisms for critical metrics (e.g., system health, latency, error rates) and establish runbooks for incident response.

Develop and maintain dashboards to provide real-time insights into system performance.

Performance Optimization & Troubleshooting:

Identify and resolve performance bottlenecks and ensure the reliability and scalability of production systems.

Perform root cause analysis for incidents and participate in on-call rotations to manage critical system incidents.

Drive improvements to system architecture, security, and disaster recovery strategies.

Collaboration & DevOps Enablement:

Work closely with development teams to incorporate CI/CD pipelines and foster a culture of “infrastructure as code” and automation.

Collaborate with security and compliance teams to ensure systems meet all regulatory and security requirements.

Promote best practices for software delivery, system monitoring, and infrastructure scalability.

Security & Compliance:

Work with the compliance and cybersecurity teams to maintain healthcare data security, ensuring that systems are SOC II and ePHI compliant.

Implement security best practices within cloud environments, including encryption, IAM, and regular audits.

Qualifications:

Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent practical experience.

3+ years of experience as a Site Reliability Engineer, managing infrastructure on AWS and/or Azure.

Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, etc.).

Expertise in Terraform, CloudFormation, AWS CDK or similar infrastructure-as-code technologies.

Proficiency in container orchestration and management (e.g., Docker, Kubernetes).

Knowledge of automation tools (e.g., Ansible, Puppet, Chef).

Familiarity with CI/CD pipeline tools such as Jenkins, GitHub Actions, or Azure DevOps.

Experience with healthcare data security and compliance (e.g., SOC II and ePHI requirements) is a plus.

Excellent problem-solving and troubleshooting skills.

Strong collaboration and communication skills.

Nice to Have:

Experience working in a regulated industry, particularly healthcare or medical devices.

Certifications such as AWS Certified Solutions Architect, Azure Administrator, or Certified Kubernetes Administrator (CKA).

Experience with AI/ML models for predictive maintenance and performance monitoring.

Familiarity with serverless architectures (e.g., AWS Lambda, Azure Functions).

Any Additional Information

Strong analytical and decision-making abilities

Able to build strong partnership with business partners and the project teams

Takes responsibility for delivering superior value and client service

Works well with people who have diverse abilities, experiences, and perspectives

Influences others without direct authority

Approaches opportunities and issues with an optimistic, action-oriented, and solution-based approach.

Good writing skills to document plans and process

About the Company

Welcome to SPECTRAFORCE, your gateway to NEWJOBPHORIA(tm)! Established in 2004, SPECTRAFORCE is now one of the largest and fastest growing U.S. staffing firms renowned for its exceptional client service, SPECTRAFORCE's innovative A.I.-powered talent acquisition platform and proven methodologies set us apart in the industry. We offer a comprehensive range of services including Contingent, Permanent, and Statement of Work (SOW) staffing solutions. Our expertise extends across multiple sectors such as Technology, Financial Se... Know more

Related Jobs

Company Name: Life.Church
Job Title: Senior Site Reliability Engineer

Edmond, United states

On site

Freelance

29-10-2025

Company Name: Integrated Resources, Inc ( IRI )
Job Title: Senior Site Reliability Engineer

Celebration, United states

Hybrid

Freelance

14-09-2025