Back to jobs

Principal Program Manager, Support Strategy & Operations

US

.

Role Overview
The AI Cloud Support Strategy & Operations Program Manager will play a critical role in designing and operationalizing the next-generation Support Delivery Model for a rapidly growing AI Infrastructure organization. This role partners closely with Support Leadership, Engineering, Customer Success, and Datacenter Operations to define and implement scalable support frameworks aligned to ITIL best practices.
The Program Manager will lead strategic initiatives that establish operational excellence across 24x7 Incident Management, Problem Management, Change Management, and Customer Success Management, ensuring the organization can deliver reliable, enterprise-grade support for AI infrastructure platforms.
This individual will manage cross-functional programs, timelines, and budgets while driving process maturity, operational visibility, and service quality across global teams.

Key Responsibilities
Support Delivery Model Development

  • Lead the design and implementation of a scalable AI Cloud Support Delivery Model aligned with ITIL service management principles.
  • Establish operational frameworks for 24x7 Incident Management, Problem Management, Change Management, and Customer Success Management.
  • Define global support workflows, escalation paths, service ownership models, and operational governance structures.
  • Develop documentation and operational playbooks for new support processes and delivery models.
  • Partner with Support Leadership to define organizational structure, role definitions, and operational responsibilities.
Program & Initiative Management

  • Lead complex cross-functional programs that improve operational efficiency, service reliability, and customer experience.
  • Develop and manage program roadmaps, milestones, deliverables, and budgets for strategic support initiatives.
  • Coordinate execution across multiple teams including Engineering, Datacenter Operations, Customer Success, and Support Engineering.
  • Identify risks, dependencies, and operational gaps, and proactively drive mitigation plans.
  • Track program health and progress using defined metrics and reporting frameworks.
Operational Excellence & Process Maturity

  • Drive adoption of ITIL-aligned service management processes across the organization.
  • Establish and monitor key operational metrics including MTTR, SLA adherence, incident trends, problem resolution effectiveness, and change success rates.
  • Implement continuous improvement frameworks to mature operational processes and service reliability.
  • Facilitate post-incident reviews and drive root cause analysis improvements through Problem Management practices.
Cross-Functional Collaboration

  • Act as a strategic liaison between Support, Engineering, Product, and Datacenter Operations to ensure alignment on operational priorities.
  • Partner with engineering teams to improve service observability, incident response automation, and operational readiness.
  • Work with Customer Success and account teams to ensure support delivery aligns with enterprise customer expectations and service commitments.
Governance, Reporting, and Executive Visibility

  • Develop executive-level dashboards and reporting mechanisms to communicate program progress, operational health, and service performance.
  • Provide leadership with insights and recommendations based on operational metrics and program outcomes.
  • Establish governance structures to ensure consistent execution of support initiatives and operational standards.
Strategy Program Leadership
  • Translate strategic objectives for AI infrastructure support into actionable programs and operational frameworks.
  • Lead the planning and execution of large-scale initiatives that enable the organization to support rapid customer growth and platform expansion.
  • Drive alignment between technical teams and operational teams to ensure scalable service delivery.
  • Identify opportunities to improve operational efficiency through automation, tooling, and process optimization.
  • Champion a culture of operational excellence, accountability, and continuous improvement across the support organization.
Education & Experience
  • Bachelor’s degree in computer science, Information Systems, Engineering, or a related field (master’s degree preferred).
  • 7–10+ years of experience in program management, operations strategy, or service delivery within cloud infrastructure, AI infrastructure, or large-scale distributed systems environments.
  • Experience working within technical support organizations, cloud operations, or site reliability engineering environments.
  • Proven experience leading cross-functional programs involving engineering, infrastructure, and operations teams.
  • Experience implementing or operating within ITIL-based service management frameworks.
  • PMP, PgMP, or ITIL certification is a plus.
Knowledge & Skills
Technical & Operational Knowledge
  • Strong understanding of cloud infrastructure operations, AI/ML infrastructure environments, or hyperscale platforms.
  • Familiarity with ITIL service management practices including Incident, Problem, and Change Management.
  • Experience working with operational tooling such as ServiceNow, Jira, observability platforms, or incident management systems.
Program Management & Leadership
  • Exceptional ability to manage complex programs across multiple teams and stakeholders.
  • Strong project planning skills including roadmap development, milestone tracking, and risk management.
  • Ability to balance strategic thinking with hands-on execution.
Communication & Collaboration
  • Excellent communication skills with the ability to influence both technical teams and executive leadership.
  • Strong stakeholder management and cross-functional collaboration skills.
  • Ability to translate technical operational challenges into business insights.
Analytical & Strategic Thinking
  • Strong analytical skills with the ability to interpret operational data and drive actionable insights.
  • Ability to identify systemic operational issues and implement long-term improvements.
  • Strategic mindset with the ability to build scalable operational frameworks.

For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...