Job Specifications
What You'll Do
Avalara is an AI-first company.
Every engineer, manager, and employee at Avalara will use AI and automation to enhance productivity, quality, innovation, and customer value. AI is integrated into our workflows, and products — and success at Avalara depends on embracing AI as an important capability, not just an optional tool.
As a Senior Site Reliability Engineer (SRE) on the Reliability Engineering Product SRE team, you will develop, and scale reliable systems using both traditional SRE practices and AI-native tools. You will incorporate Agentic AI into operational workflows, develop self-healing environments, and support our shift toward autonomous reliability operations. You will report into Senior Manager, Reliability Engineering.
What Your Responsibilities Will Be
Build AI-powered reliability systems that meet Avalara's Minimum Viable Requirement (MVR) and Software Model (SMM) standards across global SaaS environments.
Implement Agentic AI workflows using frameworks such as LangChain, n8n, MCP servers, or custom AI agents to automate the analysis, assessment, and resolution of production incidents.
Design AI-driven observability systems, integrating predictive analytics and ML-based anomaly detection into tools such as Prometheus, Grafana, Loki, Tempo, and OpenTelemetry.
AI Flow tools (e.g., n8n, Airplane.dev, Temporal.io) to orchestrate and automate multi-system reliability operations, such as alert remediation, data enrichment, and incident collaboration.
Use Go, Python, or Terraform to automate infrastructure provisioning, remediation, and observability pipelines.
Operate and extend MCP servers (Model Context Protocol) to connect AI agents with production telemetry and tooling safely, allowing autonomous reliability insights.
Define and manage SLOs, SLIs, and Service level agreements, improving signal quality through ML-based alert noise reduction and event correlation.
Troubleshoot and increase production systems using AI-assisted diagnostics, large language model (LLM) copilots, and pattern recognition across logs, traces, and metrics.
Collaborate with development teams to build into their services and embed AI-based reliability feedback loops into CI/CD pipelines.
Promote AIOps practices, mentor engineers on integrating AI into reliability workflows, and contribute to Avalara's global AI Reliability Playbook.
What You’ll Need To Be Successful
Experience
5+ years in large-scale SaaS or distributed systems environments.
Bachelor's degree in Computer Science, Engineering, or equivalent technical experience.
Comfortable participating in a rotating on-call schedule for production systems.
Qualifications
AI-Driven Operations: Hands-on experience or deep curiosity in Agentic AI, MCP servers, AI Flow tools, and AIOps (e.g., predictive maintenance, anomaly detection, automated root cause analysis).
Software Engineering: code experience in Go and Python. Proficiency in automation frameworks and API integrations.
Observability & Monitoring: Proficiency with Prometheus, Grafana, Loki, Tempo, OpenTelemetry, and ML-based metric analysis.
Infrastructure as Code (IaC): Expertise with Terraform or Pulumi, and modern CI/CD pipelines (preferably GitLab).
Cloud Platforms: Experience across AWS, GCP, and Oracle Cloud or Azure, with a focus on multi-cloud reliability.
Containers and Orchestration: understanding of Kubernetes, Docker, and low-level container internals (namespaces, cgroups).
Linux Systems: Experience in administration, hardening, tuning, and troubleshooting of Linux environments.
Networking: Solid grasp of OSI model, TCP/IP, DNS, and load-balancing in cloud-native environments.
Automation and Workflows: Familiarity with n8n, Airplane.dev, LangChain, or custom AI flow builders for reliability automation.
Documentation & Experience communicating updates and resolutions to customers and other partners — clarity and precision matter.
Mindset: Desire to eliminate toil, measure everything, and build self-sustaining systems.
Preferred Qualifications
Experience integrating AI copilots or agentic workflows into production systems.
Open-source contributor or active participant in AI or SRE-related communities.
Familiarity with n8n, LangGraph, AutoGPT, or CrewAI frameworks for autonomous agent orchestration.
Experience building predictive SLO dashboards and LLM-based observability assistants.
Avalara is an AI-first Company
AI is embedded in our workflows, decision-making, and products. Success here requires embracing AI as an essential capability.
You’ll bring experience using AI and AI-related technologies, ready to thrive here.
You’ll apply AI every day to business challenges - improving efficiency, contributing solutions, and driving results for your team, our company, and our customers.
You’ll grow with AI by staying curious about new trends and best practices, and by sharing what you learn so others can benefit too.
How We’ll Take Care Of You
Total Rewards
In addition to a gre