cover image
Veridian Tech Solutions, Inc.

Senior Site Reliability Engineer

On site

Richardson, United states

Senior

Full Time

18-09-2025

Share this job:

Skills

Communication Python Java PHP Bash Perl PowerShell Splunk ServiceNow Docker Kubernetes Monitoring Ansible Attention to detail Architecture Analytical Skills Azure node.js AWS Shell Project Management cloud platforms Organizational Skills Terraform Prometheus Grafana Microservices

Job Specifications

Role: Senior Site Reliability Engineer

Locations: Richardson, TX / Raleigh, NC / Phoenix, AZ / Hartford, CT / Indianapolis, IN

Type of Hiring: FTE

Job Description

Bachelor's degree or foreign equivalent required from an accredited institution. Will also consider three years of progressive experience in the specialty in lieu of every year of education
At least 11 years of Information Technology experience
At least 6 years of Site reliability engineering (SRE) experience in large programs with focus on architecting and implementing observability, automation across the entire lifecycle of operations.
Observability & Monitoring: Implement logging, monitoring, and alerting using any one of Dynatrace, Datadog, Splunk, Nagios, Prometheus, Grafana, ELK stack, or New Relic.
Analyze monitoring data/ golden signals to identify trends and patterns and proactively address potential problems.
Engagement to debug, optimize code, and automate routine operational tasks
Improve automation and increase the system's self-healing capability
Incident Management: participate in production incidents, perform root cause analysis (RCA), and drive post-mortem improvements.
Develop and maintain dashboards and reports to visualize system health and performance.
Use various technologies such as: ansible, Python, terraform, Powershell/Shell, JSON, create automation to reduce toil in operations
Develop automation solutions for repeated incidents/ service tasks for provisioning, scaling, backup, performance management, security, capacity mgmt etc. for infrastructure operations - Or - Develop automation/optimization solutions for repeated tickets/ signals on application operations

Preferred Qualifications

Working Knowledge of:
Troubleshooting and providing speedy solution in case of failure of the database.
SLI, SLO, error budgets.
Event correlation, AIOps with deep understanding of ITSM tools
Microservices architecture with API's and REST API's
CICD tooling and best practices
Cloud platforms such as AWS, Azure, and Google
Container orchestration and practices, including Kubernetes, Docker Swarm
Infrastructure automation tools like Terraform, Cloud Formation, Ansible, and Puppet (Any one)
Scripting Languages: any of the following: Python, JSON, Java, Node.JS, PHP, PowerShell(M) or Bash/Shell/Perl
ITSM tools such as: ServiceNow
Excellent Communications and client interaction skills along with exceptional written and verbal skills as well as technical documentation
Extraordinary Planning, Project Management, Coordination, and Analytical skills
Hands-on experience in working in Global Delivery Model with onsite/offshore resources
Exceptional Organizational Skills
Ability to manage and prioritize tasks efficiently
Readiness to demonstrate a proactive attitude
Solid attention to detail and excellent written and verbal communication skills are required
Ability to work in team in diverse/ multiple stakeholder environment

About the Company

Veridian Tech Solution Inc. is leading IT staffing and solution company which caters to various industries. We provide many services such as Staff Augmentation, Software Development, Project Management, Cloud Computing & Digital Transformation Service. We cater highly experienced professionals who will prove to be an asset to your organization and help you to reshape operations and processes to increase productivity and gain competitive advantage. At Veridian, we understand the importance of employee happiness to your organi... Know more