Back to jobs
New

Staff HPC Systems Software Engineer

US

About Nscale

Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility.

We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future.

About the Role

We’re hiring a Staff HPC Systems Software Engineer to define the technical direction and evolution of a core HPC platform domain at Nscale.

In this role, you will operate beyond a single team, shaping how multiple teams build, automate, and run Slurm-based capabilities within Nscale’s wider cloud-native platform. You’ll work across engineering boundaries to bring coherence to architecture, interfaces, lifecycle models, and operational approaches, while partnering closely with teams working on platform tooling, infrastructure APIs, identity systems, and Kubernetes-adjacent systems.

This is a high-impact staff-level role for someone who combines deep hands-on software engineering with strong systems judgement. Your work will help ensure Nscale’s HPC services are robust, supportable, and maintainable, while creating leverage through shared patterns, reusable implementations, and clear technical direction across ambiguous, business-critical problem spaces.

What you'll be doing

Domain Architecture & Technical Direction

  • Own and evolve the technical direction for a defined HPC systems domain, such as Slurm platform architecture, scheduler integrations, cluster lifecycle, workload environments, or service automation.
  • Make architectural decisions that balance software quality, operational realities, customer needs, and long-term maintainability.
  • Define how proven Slurm implementations should be packaged, automated, and exposed as a service.
  • Resolve ambiguity around ownership, interfaces, lifecycle boundaries, and operating models across teams.
  • Act as the technical escalation point for the most complex issues within the domain.

Cross-Team Engineering Leverage

  • Establish shared patterns and standards for automation, service lifecycle management, observability, reliability, and supportability across the HPC platform.
  • Drive cross-team design for integrations between Slurm, Kubernetes-adjacent systems, infrastructure APIs, identity systems, and platform tooling.
  • Create reusable modules, automation, deployment patterns, and reference implementations that increase engineering leverage.
  • Identify and correct avoidable technical divergence, duplicated effort, and fragile operating models.
  • Ensure domain designs reflect the realities of GPU scheduling, HPC networking, performance isolation, and production operations.

Delivery, Reliability & Influence

  • Lead technically critical initiatives spanning 2–4 teams or a defined HPC platform area.
  • Unblock delivery by clarifying technical direction and reducing ambiguity in complex system design problems.
  • Contribute hands-on where needed to de-risk or accelerate critical work.
  • Influence engineering teams without formal authority through strong judgement, design clarity, and practical solutions.
  • Partner with adjacent cloud-native software engineers so HPC implementations build on shared platform patterns rather than separate ones.

KPIs

  • Technical direction across a defined HPC domain
  • Delivery of critical initiatives across 2–4 teams
  • Reduction in technical divergence and duplicated effort
  • Reliability and supportability of Slurm-based HPC services

About You

  • Extensive experience designing and building production software and automation for HPC systems, especially Slurm-based environments.
  • Strong track record of writing maintainable, testable, and resilient software in Go, Python, or similar languages.
  • Proven ability to define technical direction across a domain spanning multiple teams or services.
  • Strong understanding of Slurm internals, scheduler behaviour, cluster lifecycle concerns, and operational trade-offs.
  • Strong practical understanding of GPU-backed infrastructure and HPC networking, including InfiniBand, RoCE, RDMA, and performance-sensitive workload characteristics.
  • Experience integrating HPC systems with cloud-native platforms, APIs, or service delivery models.
  • Experience creating engineering leverage through standards, reusable patterns, shared tooling, and architectural clarity.
  • Strong judgement in balancing short-term delivery with long-term platform health and supportability.
  • Strong written and verbal communication skills, with the ability to align multiple teams around a coherent technical direction.
  • Experience with other schedulers or batch systems such as Kueue is valuable.

What we can offer you

At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core.

  • Highly competitive US compensation package (base + bonus + equity), with performance reviews every 12 months. 🚀
  • Join one of the fastest-growing AI infrastructure companies — your chance to directly shape how global AI capacity is planned and deployed. ✨
  • Expect a dynamic progression plan tailored to your ambitions. Grow by leading critical cross-functional initiatives and shaping capital strategy — always with our full support.
  • Human-First Flexibility: We treat you as humans first. 🫶🏽 Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.

Equal Opportunities Statement

We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.

If there’s anything we can do to accommodate your specific situation, please let us know.

The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.

For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.

Salary Range

The range below reflects the base salary for the position. Actual compensation may vary based on job-related factors such as skill set, experience, education, and location. In addition to base salary, this role may be eligible for bonus, equity, and/or commission programs. Nscale may offer a competitive benefits package including medical, dental, vision, flexible paid time off, parental leave, and retirement plan participation.

The range below reflects the base salary for the position. Actual compensation may vary based on job-related factors such as skill set, experience, education, and location. In addition to base salary, this role may be eligible for bonus, equity, and/or commission programs. Nscale may offer a competitive benefits package including medical, dental, vision, flexible paid time off, parental leave, and retirement plan participation.

Salary Range

$225,000 - $275,000 USD

For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...

Nscale uses AI-powered tools to assist in reviewing and prioritising applications against the requirements of this role. All final hiring decisions are made by humans. To learn more about how AI is used and your rights, click "Learn more" below.

Learn more