VP of Support
About Nscale
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility.
At Nscale, our Support and Operations team plays a critical role in maintaining service availability, driving service reliability and rapid response to customer tickets globally.
We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future.
About the Role (Job Purpose)
Nscale is seeking a Vice President of Support to lead the daily operations and support of our global datacenter infrastructure. This role owns the teams responsible for monitoring, troubleshooting, and incident response across mission-critical GPU, networking, and storage systems spanning multiple datacenters worldwide.
You will ensure infrastructure health is continuously monitored, incidents are managed and resolved with speed and rigor, and support processes are executed consistently and at scale. As a senior leader, you will define and evolve operational standards, escalation models, and on-call practices that underpin reliability across Nscale’s growing footprint.
Partnering closely with infrastructure, SRE, hardware, and datacenter operations teams, you will translate complex technical issues into clear operational outcomes, drive post-incident learning, and continuously raise the bar on service quality. This role is central to delivering operational excellence, customer confidence, and dependable AI infrastructure as Nscale scales globally.
What You'll be Doing (Responsibilities)
-
Own global datacenter support operations for Nscale’s AI infrastructure, ensuring 24/7 coverage and consistent operational standards across regions.
-
Build, scale, and lead a high-performing infrastructure support organisation, including senior ICs, managers, and regional leads.
-
Define and enforce incident management, escalation, and severity frameworks for mission-critical GPU, networking, and storage platforms.
-
Act as the executive owner for major incidents, leading cross-functional response, decision-making, and external communications where required.
-
Establish and continuously improve operational runbooks, on-call models, and support playbooks aligned with hyperscale reliability expectations.
-
Partner with Infrastructure Engineering, SRE, Hardware, and Datacenter Build teams to eliminate recurring failure modes and influence design-for-supportability.
-
Drive root cause analysis, post-incident reviews, and corrective action tracking, ensuring lessons learned translate into measurable reliability improvements.
-
Define, track, and report support and reliability KPIs (SLAs, uptime, MTTR, incident trends, capacity risk) to executive leadership and the board.
-
Lead adoption of monitoring, observability, automation, and alerting strategies to reduce manual intervention and improve mean time to resolution.
-
Own support readiness for new datacenter launches, hardware rollouts, and platform changes, ensuring operational stability from day one.
-
Develop talent through mentorship, succession planning, and leadership development, creating a strong pipeline of future support leaders.
-
Ensure compliance with security, safety, and operational policies across all supported environments.
-
Define and manage the support operating model and budget, balancing cost efficiency with reliability and scale.
-
Serve as a senior voice in availability, reliability, and risk discussions, representing operational reality in strategic decision-making.
About You (Skills / Qualifications)
-
Significant leadership experience owning datacenter infrastructure support or operations at scale, ideally across multiple global sites.
-
Deep understanding of GPU, compute, networking, and storage platforms sufficient to challenge design decisions, assess risk, and guide teams through complex incidents.
-
Proven track record of leading 24/7, mission-critical operations, including executive-level incident management and stakeholder communication.
-
Strong grasp of observability, alerting, and operational telemetry, with the ability to set strategy rather than just use tools.
-
Experienced in building and operating incident, problem, and change management frameworks appropriate for hyperscale or high-growth infrastructure.
-
Demonstrated ability to partner with infrastructure engineering, SRE, and hardware teams to improve reliability through design-for-operations and prevention.
-
Credible, calm leader in high-pressure situations, capable of making decisive calls during major incidents.
-
Excellent communicator who can translate technical failure modes into business impact for executives, customers, and partners.
-
Experience scaling, mentoring, and retaining senior engineering and management talent across regions.
-
Comfortable owning operational metrics, risk reporting, and executive accountability for availability and reliability.
Nice to have
-
Experience supporting AI, ML, or high-performance computing platforms at scale.
-
Familiarity with GPU scheduling, orchestration, and containerized environments (e.g. Kubernetes-based platforms).
-
Exposure to automation and Infrastructure-as-Code strategies to reduce operational toil.
-
Awareness of datacenter efficiency, power, and sustainability considerations in large-scale infrastructure environments.
What We Can Offer You
At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core.
- Highly competitive package (base + equity) with reviews every 12 months. 🚀
- Join the fastest-growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting-edge AI. ✨
- Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support.
- Human-First Flexibility: We treat you as humans first. 🫶🏽 Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.
Join our thriving remote-first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work.
Equal Opportunities Statement
We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.
If there’s anything we can do to accommodate your specific situation, please let us know.
The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.
Apply for this job
*
indicates a required field