Back to jobs
New

Infrastructure Support Lead

US

.

What We Can Offer You

At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core.

About Nscale

Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers.  Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility.

We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future.

What You'll Be Doing

People Management & Team Leadership

  • Own day-to-day people management for the Infrastructure US team.
  • Manage, coach, and mentor team members to deliver high-quality support and right-first-time resolution
  • Conduct regular 1:1s, performance reviews, and development planning
  • Set and monitor individual and team objectives, driving accountability and continuous improvement
  • Manage shift planning, rota coverage, and on-call scheduling
  • Identify skills gaps and drive upskilling through training, mentoring, and knowledge sharing across teams
  • Ensure roles, responsibilities, and expectations are clearly understood and consistently applied

Ticket & Service Management

  • Own ticket queue management, ensuring accurate prioritisation and timely resolution
  • Monitor team productivity and workload trends, addressing bottlenecks to maintain service levels
  • Ensure adherence to ITIL processes across incidents, requests, changes, and problem management
  • Maintain accurate reporting on ticket status, SLA adherence, and team performance for the Infrastructure Support Manager

Operational Excellence & Continuous Improvement

  • Identify regional risks and escalate priorities to the Infrastructure Support Manager where required
  • Improve dashboards, alerting, and runbooks to reduce repeat incidents and drive self-service resolution
  • Maintain consistent standards, processes, and documentation across the regional teams
  • Ensure compliance with audit, security, and operational requirements

Technical Contribution & Support

  • Work alongside Senior Engineers on complex incidents, technical improvements, and operational tooling
  • Provide hands-on support across compute, storage, and networking layers
  • Support Kubernetes environments and Linux-based infrastructure at scale
  • Contribute to scripting and automation to improve operational workflows
  • Travel to Nscale or customer sites when needed to provide technical support

Incident Management & Stakeholder Interaction

  • Act as the escalation point for complex or high-impact incidents including on-call (regional)
  • Lead post-incident reviews, identify recurring patterns, and ensure follow-up actions are tracked and delivered
  • Contribute to readiness and support planning for new services, deployments, and projects

 

 

About You

  • Adaptable to customer-driven demands, including out-of-hours support and travel for onsite technical work
  • Disciplined, organised, and self-motivated, with the ability to lead, mentor, and support engineers in a fast-paced environment
  • Strong leadership mindset with a bias for decisive action, accountability, and continuous improvement
  • Experience leading or managing engineers in an operational support environment, including performance, development, and day-to-day team oversight
  • Experience owning team workload, prioritisation, and service delivery to meet SLAs
  • Excellent communication and interpersonal skills, able to work effectively across all levels of the organisation
  • Solid understanding of datacentre technologies (servers, networking, storage, virtualisation) within an operational support context
  • Strong Linux systems engineering experience, with proven troubleshooting across compute, storage, and network layers in production
  • Experience operating and debugging Kubernetes environments and distributed systems
  • Strong networking fundamentals (L2/L3, routing, VLANs, load balancing), with awareness of high-performance fabrics (RDMA/NVLink)
  • Experience with observability, monitoring, and incident response, including driving issues to resolution and contributing to post-incident improvements
  • Familiarity with SRE practices, including runbooks, process improvement, and reducing manual intervention
  • Experience with scripting and automation (Bash, Python, or similar) and Infrastructure as Code tools (e.g. Ansible, Terraform)
  • Strong analytical and problem-solving skills, with the ability to perform deep-dive investigations and root cause analysis
  • Familiarity with cloud infrastructure and virtualisation technologies; OpenStack experience preferred
  • Understanding of ITIL processes (incident, problem, and change management)

Nice to Have:

  • Experience with GPU platforms (NVIDIA/AMD) and performance diagnostics (e.g. nvidia-smi, NCCL)
  • Exposure to HPC or distributed workloads (e.g. RDMA, InfiniBand, MPI)
  • Experience with CI/CD or GitOps tooling
  • Experience working in multi-region environments

 

At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core.

  • Highly competitive package (base + equity) with reviews every 12 months. 
  • Join the fastest-growing tech startup, your chance to push boundaries, collaborate with brilliant minds, and make your mark on cutting-edge AI. ✨
  • Expect a dynamic progression plan tailored to your ambitions. Grow by trying new things, leading, challenging the status quo, and owning your impact, always with our full support.
  • Human-First Flexibility: We treat you as humans first. 🫶🏽 Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.

Join our thriving remote-first team. Geography is no barrier to impact or connection. We build seamless virtual collaboration, empowering you, wherever you work.

Equal Opportunities Statement

At NScale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities.  We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.

If there’s anything we can do to accommodate your specific situation, please let us know.

For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...