Job Specifications
Help us use technology to make a big green dent in the universe!
Kraken powers some of the most innovative global developments in energy.
We’re a technology company focused on creating a smart, sustainable energy system. From optimising renewable generation, creating a more intelligent grid and enabling utilities to provide excellent customer experiences, our operating system for energy is transforming the industry around the world in a way that benefits everyone.
It’s a really exciting time in energy. Help us make a real impact on shaping a better, more sustainable future.
Our Global Platform Engineering Reliability group is responsible for architecting, developing, and maintaining the resilient and scalable infrastructure that power and support our platforms.
As a Lead Site Reliability Engineer within the newly created ‘Product Reliability’ team, you'll be responsible for ensuring the availability, performance, and scalability of the products on our platform. Your proficiency in leading technical teams that support products serving millions of customers will ensure stability and high performance for our brands and clients.
You will keep up with best practices in building products for scale. Your communication skills and attention to detail will be indispensable as you pinpoint areas for enhancement, ensure optimal product performance, and continuously improve our platforms reliability and efficiency.
What you'll do:
Team leadership
Have ownership of the Product Reliability team within Platform, working closely with the Director and Heads of Platform Engineering to define strategic objectives and team direction
Manage team priorities and ensure initiatives are completed within deadlines
Collaborate regularly and effectively with the Staff Platform Engineer in your functional team to deliver the technical implementation of the team’s strategic priorities
Lead delivery of major initiatives on clear timelines
Partner effectively in the wider Platform Engineering team to deliver outcomes
Build a strong culture of open communication where teammates can ask questions without fear, promoting a positive and inclusive team environment
People management
Line-manage the engineers in the Product Reliability team
Set clear performance expectations and goals for team members
Regularly review individual and team performance, offering actionable insights and constructive feedback to support and grow team members
Technical delivery
Deliver technical improvements such as small features and bug fixes
Support team delivery through code reviews, technology research and architectural guidance
Provide support for service offerings owned by your team
Help solve interesting and difficult problems. There’s a great opportunity for disruption in the global energy market
What you'll have:
Excellent communication skills, working effectively with developers, product managers and other business stakeholders to understand and deliver impactful projects and reliability improvements
Record of successfully and consistently delivering critical path projects, on time and at scale
Meticulous organisation and planning skills
Experience of mentoring and coaching a team to perform at a high-level of quality
Experience managing and supporting a large-scale internet-facing distributed systems, for millions of customers
Good experience with AWS and a programming language. We use a lot of different AWS services and not just the standard few
Knowledge of security best-practices, security and CI/CD tooling, and methodologies
We're hiring this role in New York City, but would also consider remote candidates who are based in the EST timezone, we cannot consider any applicants outside this region
What will help:
Previous experience in leading technical delivery for small, highly-autonomous teams
Previous experience as a technical individual contributor, preferably as a Site Reliability Engineer
Track-record of effective collaboration with other teams and departments to drive holistic outcomes
A proactive, innovative mindset with the ability to drive continuous improvement
Previous experience working in a remote-first asynchronous global team
Familiarity with some of our tech stack:
- PostgreSQL, or a similar RDBMS, particularly in Amazon RDS at scale
- Docker and Kubernetes, we use Amazon EKS in production
- Python
- Datadog, or a similar logging/monitoring tool
- Messaging queues, event-driven async processing or similar technologies - we use RabbitMQ
- Terraform, or a similar infrastructure-as-code tool
- Experience with a Linux distribution
Why you'll love it here:
Great medical, dental, and vision insurance options including FSAs
Paid time off — we know working hard means also being able to recharge as needed, we trust our employees to get the work done and take the time they need
401(k) plan with employer match
Parental leave. Biological, adoptive and foster parents are all eligible.
Pre-tax commuter benefits
About the Company
Improve customer satisfaction, increase product innovation, generate new revenue streams, and make significant operational savings. All with the only proven end-to-end, AI-powered operating system for energy utilities. Now expanded to support water, and telco.
Know more