Job Specifications
OUR STORY:
Join Scaleway and shape the sovereign cloud of tomorrow !
Since 1999, we have been designing secure, sustainable infrastructures aimed at supporting the most ambitious companies.
Historically known for our dedicated servers (Dedibox), we made a strategic shift to cloud computing in 2015. Staying true to our principles of simplicity, flexibility, and technical excellence, we have become one of the leading players in Europe in the sector.
With the rise of artificial intelligence, we have strengthened our commitment, supported by the Iliad Group, which is investing €3 billion to develop a serious, sovereign AI alternative to American and Asian giants.
Every day, thanks to our fast-growing portfolio of cloud and AI products (bare metal, containerization, serverless, AI, etc.), Scaleway proudly serves thousands of customer across the private and public sector, from corporations like France Télévisions or Hachette Livre, to fast-growing startups like Photoroom and Biolevate, to institutions like the City of Copenhagen.
Our offices are located in Paris, Lille, Toulouse, Rennes, Rouen, Bordeaux and Lyon.
WHY WE NEED YOU?
Our growth is driving us to strengthen our SRE team to support and scale our production environments.
Your mission will be to build and maintain reliable, observable, and secure infrastructure in order to ensure optimal service availability for our customers around the world.
#HPC #AI #GPU #CLUSTERS
YOUR FUTURE TEAM
We work in a collaborative and international environment where the diversity of Scalers, combined with a spirit of sharing, helps bring new projects to life every day, advancing our ambitions together.
You will join a newly formed team dedicated to building and operating Scaleway’s future AI infrastructure. As part of this group, you will design, maintain, and scale core systems and observability tools, partner with product teams, and ensure the reliability and performance of AI services across Scaleway.
YOUR DAILY ROUTINE
Build a large AI infrastructure with monitoring, diagnosis, and remediation of production incidents- Troubleshoot high-impact production issues in collaboration with other engineering teams
Participate in an on-call rotation to handle incidents and ensure service continuity
Implement and maintain observability solutions to monitor AI infrastructure and application health
Contribute to AI infrastructure lifecycle management across different environments and countries
Promote and apply best practices in terms of stability, resiliency, scalability, and security
Maintain clear technical documentation for tools and procedures
Contribute to system and tool evolution based on production feedback
Collaborate closely with development teams to ensure infrastructure readiness- Participate in team rituals and knowledge-sharing initiatives
About You
SOFTSKILLS :
Proactive and solution-oriented mindset
Passion for automation and continuous improvement
Strong collaboration and communication skills
Ability to work independently and in a team
Willingness to mentor and share knowledge
HARDSKILLS :
Experience with Go, Python or Rust
Strong scripting skills (Bash, Python)
Hands-on experience with Linux systems (Ubuntu/Debian)
Hands-on experience with GPU & HPC infrastructure
Knowledge of networking (TCP/IP, DNS, BGP, load-balancing, IPv6, etc.)
Familiarity with monitoring and logging tools (Prometheus, Grafana, Elastic, etc.)
Comfortable with Infrastructure-as-Code (Ansible, Salt, AWX, etc.)
Experience managing relational databases (PostgreSQL)
Understanding of CI/CD pipelines (GitLab)
Comfortable with English (written and spoken)
WHAT YOU WILL FIND AT SCALEWAY ++++
Hybrid work: We offer up to 3 days of remote work per week.
Offices: Our offices are spacious, dynamic workspaces with bold design, conveniently located near public transport. Most of our offices feature outdoor spaces (terraces) and bike parking facilities.
Dining: Our chef provides a healthy meal service at the headquarters, and breakfast is available across all our sites year-round. Scalers working from regional sites enjoy a Swile card for lunches.
Well-being commitments: Whether it’s access to a gym, daycare places, or discounted services for caring services, Scaleway is committed to supporting Scalers in maintaining a balanced life.
International environment: With dozens of nationalities, Scaleway offers a stimulating environment where English is as widely spoken as French.
Career & Mobility: Our managers value internal mobility, and opportunities to transition to other entities within the Iliad Group are accessible to all Scalers.
Why join the Scaleway adventure ?
A rich and diverse product offering: Scaleway offers over 100 public cloud products in IaaS, PaaS, and AI.
A cutting-edge technical environment: Scaleway provides modern infrastructures, including high-performance bare metal servers, to tackle exciting technical challenges.
Commitment to responsible cloud: Scal
About the Company
European. Cloud. AI.
Founded in 1999, Scaleway is a pioneer in European cloud computing, providing a complete ecosystem to build, train, deploy, and scale AI models and cloud-native applications. We offer sovereign, sustainable, and high-performance infrastructure designed for today's tech-driven world.
Key figures:
- 20+ years of expertise in cloud infrastructure
- 80+ innovative cloud products and services
- 9 renewable energy-powered data centers
- 65 points of presence worldwide
- Industry-leading PUE of 1.15, the l...
Know more