Data Centre Infrastructure Quality & Risk Lead
About Nscale
Nscale is the GPU cloud engineered for AI. We deliver cost-effective, high-performance infrastructure that enables AI-first companies to scale rapidly while reducing complexity across design, build, and operations. Our platform supports strategic business outcomes across performance, cost efficiency, and sustainability.
We operate with a culture of ownership, accountability, and relentless improvement. As an Nscaler, you’ll work alongside high-performing teams to build the systems, processes, and infrastructure that power the future of AI at scale.
The Role
The Data Centre Infrastructure Quality & Risk Lead is responsible for establishing and operating Nscale’s risk and quality assurance framework across all enterprise and business-as-usual (BAU) deployments.
This role requires deep, practical experience working within live or deployed data centre environments—including exposure to design, build, commissioning, and operational phases. You will bring real-world understanding of how data centres are delivered and run, enabling you to provide meaningful challenge, identify risk early, and ensure quality standards are upheld in complex, high-density GPU environments.
The role provides independent oversight, structured challenge, and hands-on support to ensure that complex GPU and data-centre programmes are delivered safely, predictably, and to defined standards.
Key Responsibilities
Enterprise Risk Framework Ownership
- Own and continuously improve the enterprise risk management framework across all GPU and data-centre deployments.
- Define and maintain standardised risk processes, templates, thresholds, and escalation paths for Enterprise and BAU programmes.
- Facilitate structured risk identification workshops grounded in real delivery risks seen in data centre environments (e.g. power, cooling, rack integration, commissioning delays).
- Ensure every programme maintains a live risk register with clear ownership and mitigations.
Portfolio-Level Risk Management
- Maintain a consolidated, portfolio-level view of risks across all deployments, including capacity, power availability, cooling, supply chain constraints, network fabric, and multi-DC dependencies.
- Identify cross-programme and systemic risks typical of data centre build and operations and escalate critical items to senior leadership.
- Support leadership decision-making by providing clear risk insights, mitigation options, and contingency scenarios.
Quality Assurance & Project Assurance
- Define and own quality standards covering data centre design, build, installation, testing, commissioning, and handover.
- Establish checklists, acceptance criteria, and documentation requirements aligned to Nscale’s reference architectures (e.g. 10K / 20K / 50K GPU designs).
- Plan and conduct independent assurance reviews at key delivery gates, including design freeze, pre-build, pre-commissioning, pre-go-live, and post-implementation.
- Ensure on-site realities and operational constraints are reflected in all quality standards.
Support to PMs, Architects & Engineering Teams
- Act as a subject matter partner to delivery teams, bringing practical DC experience to guide risk identification and quality expectations.
- Coach project teams on the application of risk and quality standards within live data centre build and operational environments.
- Support PMs in structuring effective risk registers and mitigation plans.
- Lead or support root-cause analysis of delivery or operational issues (e.g. installation defects, commissioning failures, infrastructure constraints).
- Ensure lessons learned are captured and fed back into standards, playbooks, and reference designs.
Compliance, Monitoring & Reporting
- Monitor adherence to defined methods, standards, and risk processes across all programmes.
- Track exceptions, trends, and recurring issues, particularly those arising from data centre delivery and operations, and report them to the Head of Enterprise Deployments.
- Define, track, and report KPIs related to risk and quality performance (e.g. unmanaged high risks, assurance findings, defect rates, audit outcomes).
Key Deliverables
- Risk Management Framework & Playbook – Tailored to GPU and data-centre deployments, reflecting real-world delivery and operational risks.
- Programme & Portfolio Risk Register – Including DC-specific risks such as power, cooling, capacity, and commissioning dependencies.
- Quality Management Plan & Standards – Covering engineering work packs, installation standards, test plans, and handover documentation.
- Assurance Review Reports – Including findings from design, build, and commissioning phases.
- Compliance & Audit Evidence – Alignment with internal policies, safety, and relevant DC standards.
- Lessons Learned & Continuous Improvement Log – Feeding back into improved delivery of future data centre deployments.
Qualifications & Experience
- 7+ years of experience working within data centre, critical infrastructure, or high-availability environments (essential).
- Proven experience in live data centre environments, including exposure to one or more of: design, construction, commissioning, or operations.
- Background in quality assurance, risk management, programme assurance, or PMO within large-scale infrastructure or DC deployments.
- Strong understanding of programme and portfolio risk management frameworks.
- Experience defining and enforcing quality standards across complex engineering or infrastructure programmes.
- Ability to engage credibly with engineers, construction teams, and senior stakeholders.
- Strong analytical, facilitation, and communication skills, with the ability to challenge constructively and escalate when necessary.
- Experience in GPU, hyperscale, or cloud infrastructure environments strongly preferred
At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.
If there’s anything we can do to accommodate your specific situation, please let us know.
The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.
Apply for this job
*
indicates a required field
