Back to jobs
New

Senior Technical Product Manager, Fleet Operations

London

About Nscale

Nscale is taking on the hyperscalers by building a vertically integrated GenAI cloud platform. We own the data centres, software, and applications that power today's AI stack using sustainable technology solutions. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. Collaboration is key, and we work together swiftly and respectfully, embracing adaptability and resilience in all we do.

About the role

Technical Product Managers at Nscale own the definition, delivery, and ongoing evolution of a slice of the Nscale platform. You partner closely with engineering, design, research, and go-to-market teams to translate customer problems and operational realities into shippable product outcomes. As a Senior Technical Product Manager for Fleet Operations, you own the product strategy for the day 0–2+ operational software that runs our global GPU fleet — the systems that bring capacity online, keep it healthy, and restore it fast when things go wrong. You partner daily with Fleet Software engineering teams, SRE, and Support to turn operational pain into durable product: provisioning and bringup (day 0), testing and deployment (day 1), and the full lifecycle of monitoring, incident response, repair, RMA, firmware, and decommissioning (day 2+). You operate at team scope, owning a major product area and driving multi-quarter initiatives that directly move fleet availability, utilisation, and time-to-recover. Senior Technical Product Manager, Fleet Operations 1

What you'll be doing

  •  Own the strategy and roadmap for a significant Fleet Operations product area — e.g. provisioning and bring-up, fleet health and telemetry, incident and repair workflows, firmware and lifecycle management, or capacity and inventory.
  • Lead multi-sprint, cross-functional initiatives from problem framing through rollout across live GPU clusters, working hand-in-hand with Fleet Software, SRE, data centre operations, and Support.
  • Turn operational ambiguity into product: shadow on-call rotations, ride along with support and repair workflows, and translate recurring toil into tooling, automation, and platform capabilities. 
  • Define the metrics that matter for a GPU fleet — availability, utilisation, MTTR, time-to-bring-up, hardware failure rates, support ticket deflection — and drive the roadmap against them.
  • Partner with engineering on architecture and trade-offs for systems that span bare metal, orchestration, observability, and control planes.
  • Drive incident reviews and postmortems into product commitments; close the loop so the same class of issue doesn't recur.
  • Mentor junior product managers and raise the quality bar for PRDs, reviews, and product decisions across the team. 
  • Represent Fleet Operations in planning, reviews, and leadership updates.

What you need

  • 5–8 years of product management experience in software or technology, with a track record of owning significant product areas in infrastructure, platform, or operations-facing products.
  • Strong technical fluency in large-scale systems: you can lead discussions with engineering on architecture, trade-offs, and feasibility across provisioning, orchestration, observability, and control-plane design
  • Experience building products for operators — SREs, NOC/support teams, data centre technicians, or similar — and a genuine appetite for understanding their workflows.
  • Demonstrated ability to move from an ambiguous operational problem space to shipped product outcomes that measurably improve reliability, efficiency, or time-to-recover.
  • Experience mentoring or informally leading peers. Excellent written and verbal communication; you can make complex product decisions legible to engineers, operators, and executives alike.

Nice to haves

  • Degree in computer science, engineering, or a related field, or prior experience as an engineer or SRE.
  • Hands-on background in cloud infrastructure, bare-metal provisioning, fleet or hardware lifecycle management, observability/monitoring platforms, or incident management tooling.
  • Experience with bare-metal provisioning systems such as OpenStack Ironic (or equivalents like MAAS, Tinkerbell, or in-house provisioning stacks).
  • Experience with DCIM tools such as NetBox (or equivalents like Device42 or Nautobot) for inventory, cabling, and rack/asset management.
  • Experience with ITSM and ticketing platforms such as Jira Service Management (or equivalents like ServiceNow, Zendesk, or Freshservice) for support, incident, and RMA workflows.
  • Experience with observability and monitoring platforms such as Grafana, Prometheus, Datadog, or equivalents — ideally including defining SLOs, dashboards, and alerting for large fleets.
  • Familiarity with GPU or accelerated compute environments, data centre operations, or hyperscaler-style fleet management.
  • Experience operating in high-growth or early-stage environments where the product is being built alongside the fleet itself

Join Nscale as we build a world-class AI cloud platform. If you're excited about owning the software that keeps a global GPU fleet running — and raising the bar for the team around you — we'd love to hear from you!

At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities.  We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.

If there’s anything we can do to accommodate your specific situation, please let us know.

The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.

For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...

Nscale uses AI-powered tools to assist in reviewing and prioritising applications against the requirements of this role. All final hiring decisions are made by humans. To learn more about how AI is used and your rights, click "Learn more" below.

Learn more