cover image
Avalara

Avalara

www.avalara.com

1 Job

5,793 Employees

About the Company

Avalara offers the most advanced tax and compliance platform powered by AI. For more than two decades, Avalara has developed one of the most expansive libraries of tax content and integrations in the industry, supporting over 43,000 businesses and government entities across more than 75 countries. The company's purpose-built AI agents automate end-to-end compliance processes with greater precision, from tax calculations and return filings to exemption certificate management and beyond. For more information, visit Avalara.com.

Listed Jobs

Company background Company brand
Company Name
Avalara
Job Title
Senior Site Reliability Engineer
Job Description
Job Title: Senior Site Reliability Engineer Role Summary: Senior Site Reliability Engineer responsible for building and scaling reliable, AI‑driven systems in a global SaaS environment. Designs and implements automation, observability, and self‑healing pipelines that integrate model‑centric and agentic AI, while managing SLOs, SLIs, and incident response. Expectations: - 5+ years in large‑scale SaaS or distributed systems. - Bachelor’s in Computer Science, Engineering, or equivalent. - Willingness to participate in rotating on‑call rotation. - Strong commitment to reducing toil, measuring everything, and building autonomous reliability. Key Responsibilities: - Build AI‑powered reliability systems meeting MVR and SMM standards. - Implement Agentic AI workflows (LangChain, n8n, MCP servers, custom agents) for incident analysis, assessment, and resolution. - Design AI‑driven observability stacks (Prometheus, Grafana, Loki, Tempo, OpenTelemetry) with predictive analytics and ML anomaly detection. - Orchestrate reliability operations using AI Flow tools (n8n, Airplane.dev, Temporal.io) for alert remediation, data enrichment, and incident collaboration. - Automate infrastructure provisioning, remediation, and observability pipelines with Go, Python, or Terraform. - Operate and extend MCP servers to connect AI agents with production telemetry. - Define and manage SLOs, SLIs, and SLAs; improve signal quality via ML‑based alert noise reduction and event correlation. - Troubleshoot production systems using AI‑assisted diagnostics, LLM copilots, and pattern recognition on logs, traces, and metrics. - Integrate AI reliability feedback into CI/CD pipelines with development teams. - Mentor engineers on AIOps practices and contribute to the global AI Reliability Playbook. Required Skills: - Agentic AI & AIOps: MCP servers, AI Flow tools, predictive maintenance, anomaly detection, automated root cause analysis. - Software Engineering: Go, Python, automation frameworks, API integrations. - Observability & Monitoring: Prometheus, Grafana, Loki, Tempo, OpenTelemetry, ML‑based metric analysis. - Infrastructure as Code: Terraform or Pulumi; modern CI/CD (GitLab preferred). - Cloud Platforms: AWS, GCP, Oracle Cloud or Azure; multi‑cloud reliability focus. - Container Orchestration: Kubernetes, Docker; low‑level container internals. - Linux Administration: hardening, tuning, troubleshooting. - Networking: OSI model, TCP/IP, DNS, load‑balancing in cloud‑native environments. - Automation & Workflows: n8n, Airplane.dev, LangChain, custom AI flow builders. - Documentation & Communication: clear, precise reporting to customers and partners. Required Education & Certifications: - Bachelor’s degree in Computer Science, Engineering, or equivalent technical experience. - Certifications in cloud platforms (AWS, GCP, Azure, Oracle) and IaC tools (Terraform, Pulumi) are advantageous. ---
United states
Remote
Senior
21-01-2026