Job Application for Senior Site Reliability Engineer at SSV Labs

About SSV Labs

SSV Labs is the core team behind the SSV Network - pioneering decentralized infrastructure for Ethereum staking. We’re building tools, protocols, and standards to make staking more secure, scalable, and trustless.

Our flagship upgrade, SSV 2.0, brings that vision to life. It’s a fully modular, next-gen staking layer that enables anyone to run validators with decentralized, non-custodial node operators. With built-in MEV support, advanced slashing protection, and performance at scale, SSV 2.0 is the foundation for Ethereum’s future.

Why join our team?

We are a global team that are looking for people with diverse professional backgrounds to join and contribute their ideas and help in the company's success. You'll get to work in a transparent environment with the most innovative technologies, be constantly challenged and learn from like-minded professionals and leaders in the industry. We offer a platform for you to develop and achieve your career goals. We look forward to hearing from passionate, goal-oriented applicants ready to make their mark in the blockchain space.

As a Senior Site Reliability Engineer, you'll work at the intersection of cloud infrastructure and blockchain, building the platform that our product teams deploy to. You'll work closely with product teams to define the tooling and abstractions that let them iterate fast, while keeping everything reliable underneath. The stack spans multiple clouds and Kubernetes clusters, and supports everything from APIs to full Ethereum testnets, so deep Kubernetes experience is a must.

You'll also own bringing AI into our engineering workflows. We want someone who can build and deploy autonomous agents and LLM-powered tooling that makes the whole engineering org more productive, not just use off-the-shelf copilots.

This role comes with real ownership and room to shape how we operate as we grow.

Responsibilities:

Design and implement infrastructure and tools that empower our product teams to rapidly and securely iterate, emphasizing reliability and automation.
Influence the strategic direction of our infrastructure and operational practices, ensuring that we are well-positioned to scale and support our growing organization.
Take a proactive role in the resolution of production issues, ensuring that we are well-prepared to handle incidents and that we learn from them in a blameless manner.
Work closely with product teams on crucial initiatives such as production deployments, release management, and incident handling, aiming for seamless operations.
Offer technical expertise and input to support the continual adoption and modernization of our platform and infrastructure.
Build and deploy AI-powered tooling (autonomous coding agents, LLM-assisted CI/CD, automated incident triage) that makes the engineering org more productive. Think: sandboxed environments where agents can write, test, and verify code without human babysitting.
Foster a culture of continuous learning and improvement, encouraging constructive review and adaptation processes.

Your Experience & Qualifications:

Kubernetes expertise, with a strong understanding of its core concepts and the ability to manage and maintain clusters.
Expertise within modern cloud native tools, e.g. ArgoCD for GitOps, Terraform/Crossplane for IaC, and the Grafana LGTM stack (Loki, Grafana, Tempo, Mimir) for observability.
3-5 years of experience in using Infrastructure as Code and tools for cloud provisioning - Must
3-5 years of practice in development and scripting in languages like Go, Python, or similar - Must
Proficient in both written and spoken English, with exceptional communication abilities.
Expertise when it comes to Linux environments, containerization, and cloud technologies.
Comprehensive knowledge of production management concepts for distributed systems.
A history of 3-5 years in operational roles, overseeing production settings.
AI fluency. You use AI coding tools daily and have opinions about what works. More importantly, you can build and deploy LLM-powered developer tooling and autonomous agents, not just consume them. We want someone who thinks about how to make an entire engineering team more productive with AI.
Networking knowledge: bonus points for service mesh experience, platform engineering and cross-cloud networking.
Familiarity with the Ethereum ecosystem, staking, and blockchain technologies - Advantage

*We offer equal opportunities and ensure an inclusive recruitment for our global teams without consideration of race, gender, culture, or sexual orientation.

Create a Job Alert

Interested in building your career at SSV Labs? Get future opportunities sent straight to your email.

Senior Site Reliability Engineer

Apply for this job