Skills

Communication Leadership Slack Incident Response Workday Monitoring Configuration Management Architecture Cloud Architecture Machine Learning Databases Organization AWS Team Development Terraform

Job Specifications

Take your career to new heights with Loopio!

Loopio is looking for a senior engineering leader to own our Site Reliability Engineering (SRE), Infrastructure, and MLOps teams. In this role, you will be the primary architect of the reliability, scalability, and cost efficiency of the systems that power Loopio’s platform.

You’ll lead teams that design, build, and operate our production infrastructure, ensuring our services are resilient, observable, and ready to scale as we integrate advanced AI and agentic workflows. You’ll partner closely with Product Engineering, Security, and Data teams to enable fast, safe delivery while maintaining operational excellence.

Note: This is an existing vacancy on the team

What You’ll Be Doing

Leadership & Team Development

Lead and grow multiple teams across SRE, Cloud Infrastructure, and MLOps.
Coach and develop engineering managers and senior individual contributors, fostering a culture of ownership and high craft.
Build a "Platform-as-a-Product" mindset, ensuring that infrastructure and ML tooling serve as enablers for the rest of the engineering organization.
Partner with Recruiting to attract and retain specialized talent in the cloud, reliability, and machine learning infrastructure space.

Reliability & Operational Excellence

Own the operational health of production systems, including availability, latency, and durability.
Define and evolve SLIs, SLOs, and error budgets, moving the organization toward data-driven reliability decisions.
Lead incident response, driving blameless postmortems and systemic improvements to reduce "toil" and improve on-call sustainability.
Support ML-specific reliability, ensuring that model inference pipelines and vector databases meet the same high standards as our core SaaS platform.

Infrastructure & MLOps Strategy

Evolve Loopio’s cloud architecture, overseeing capacity planning, disaster recovery, and business continuity.
Drive the MLOps roadmap, establishing standards for model deployment, monitoring, and scaling (including LLM orchestration and RAG pipelines).
Lead Cloud FinOps, ensuring our infrastructure and AI compute costs are visible, intentional, and optimized.
Establish standards for infrastructure automation (IaC), configuration management, and secrets handling.

Security & Cross-Functional Leadership

Partner with Security to ensure "secure-by-default" infrastructure and robust backup/recovery strategies.
Communicate risks and trade-offs clearly to senior leadership, acting as a calm, trusted voice during high-severity events.
Collaborate with Product Engineering to support the delivery of high-impact AI features without sacrificing platform stability.

What You’ll Bring To The Team

8+ years of experience in infrastructure, SRE, or cloud engineering roles, with 3+ years leading specialized engineering teams.
Deep Cloud Proficiency: Extensive experience with AWS (preferred) and modern infrastructure-as-code (Terraform).
Operational Grit: A proven track record of leading teams through production incidents and complex architectural migrations.
MLOps Awareness: Understanding of the unique infrastructure needs for machine learning, such as GPU orchestration, model serving, or data pipeline stability.
Systems Scaling & Observability: Proven expertise in managing large-scale containerized environments and leveraging observability stacks to ensure platform health.
Strategic Communication: Ability to align technical roadmaps with business objectives and advocate for infrastructure investment.
Experience with FinOps or managing significant cloud budgets is a plus.
Background in supporting AI agentic workflows or autonomous orchestration systems is a plus.

Where You’ll Work

Loopio is a remote-first workplace because we recognize the advantages of working flexibly. We are HQ’d in Canada, with established hub regions around the world where we hire from.
Our employees (or Loopers, as we call ourselves!) live and work in Canada (British Columbia and Ontario), London, and India (specifically in Gujarat, Maharashtra, and Bengaluru).
The majority of our team is based in ON and BC, which means these employees live and work remotely within a 300km radius of Toronto (within Ontario) and Vancouver (Within BC).
We offer flexible co-working locations available to Loopers in ON and BC. Those based in ON have the option of working out of our convenient co-working space located in the heart of Downtown Toronto and a 12-minute walk from Union Station. BC Loopers have the option to work centrally in Vancouver. It is whatever works best for you!
You’ll collaborate with your teams virtually across the UK, India, and North America (we’re just a Zoom call and Slack message away!) with core sync hours and focus time for headsdown work during the workday
We encourage asynchronous collaboration to effectively work as a global #OneTeam!

Why You’ll Working at Loopio

Your manager supports your development by providing ongoing feedback and regula

About the Company

Loopio is a Toronto-based RFP response software provider that helps companies streamline their process for RFPs, DDQs, and Security Questionnaires. With Loopio, teams can respond faster, improve response quality, and win more business. Loopio is one of Canada’s fastest-growing tech startups. It ranked twice as one of the fastest-growing companies on the Deloitte Technology Fast 50™ list and was selected twice as one of LinkedIn’s Top Startups in Canada. Visit www.loopio.com to learn more. Know more