Infrastructure Operations Operator
About Nscale
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility.
At Nscale, our Operations team plays a critical role in maintaining service availability, driving operational excellence, and delivering exceptional customer experiences across our AI infrastructure platform.
We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you'll be contributing to building the technology that powers the future.
About the Role
We are hiring an Infrastructure Operations Operator to support the day-to-day operation of Nscale's AI infrastructure environments.
This role is ideal for an experienced operator who combines strong technical aptitude with excellent operational judgment and stakeholder management skills. You will play a critical role in maintaining service availability, responding to operational incidents, supporting customers, and ensuring the smooth running of our data centre environments.
As a senior member of the operations team, you will take ownership of complex operational issues, mentor junior operators, and contribute to the continuous improvement of operational processes and service delivery. You'll be comfortable working in high-pressure situations, making decisions with incomplete information, and collaborating across multiple teams to achieve the best outcomes for customers and the business.
Flexibility to work shifts, participate in on-call rotations, and travel to support operations across multiple data centre locations is essential.
What you'll be doing
Operational Support & Service Delivery
-
Monitor and maintain the operational health of Nscale's data centre infrastructure and supporting systems.
-
Ensure operational activities are performed in accordance with established procedures and service level agreements (SLAs).
-
Support day-to-day service delivery activities, maintaining high levels of reliability and customer satisfaction.
-
Participate in shift rotations and provide on-call support as required.
-
Support operational activities across multiple data centre locations when needed.
Incident Management & Troubleshooting
-
Lead incident response efforts and coordinate resolution activities during service-impacting events.
-
Diagnose and resolve hardware, software, networking, and infrastructure-related issues.
-
Escalate complex operational and technical issues appropriately and coordinate with specialist engineering teams.
-
Conduct root cause analysis and contribute to post-incident reviews and corrective action plans.
-
Support service restoration efforts during critical incidents and high-priority operational events.
Customer Support & Escalation Management
-
Handle advanced customer support requests and operational escalations.
-
Act as a technical point of contact during incidents and customer-impacting events.
-
Ensure timely communication and resolution of customer issues.
-
Maintain a customer-first mindset while balancing operational priorities and business needs.
Team Leadership & Knowledge Sharing
-
Mentor and support junior operators, sharing operational expertise and technical knowledge.
-
Assist with onboarding and training of new team members.
-
Promote operational excellence and continuous learning across the team.
-
Contribute to building a strong operational culture focused on ownership and accountability.
Infrastructure Operations & Vendor Coordination
-
Coordinate with vendors and suppliers for hardware replacements, maintenance activities, and operational support.
-
Support asset management processes and maintain accurate infrastructure inventory records.
-
Assist with hardware deployment, installation, and lifecycle management activities.
-
Ensure operational readiness of infrastructure and supporting systems.
Process Improvement & Documentation
-
Contribute to the development and optimization of operational processes and workflows.
-
Create and maintain operational documentation, runbooks, and standard operating procedures.
-
Identify opportunities for automation and efficiency improvements.
-
Support operational readiness initiatives, training programs, and business improvement projects.
About You
Required Experience
-
2+ years of experience in data centre operations, infrastructure operations, technical operations, or a similar operational environment.
-
Strong understanding of data centre operations, server hardware, and networking fundamentals.
-
Proven ability to diagnose and resolve complex operational and technical issues.
-
Experience working within operational environments governed by SLAs and service management processes.
-
Strong analytical and problem-solving capabilities.
-
Excellent communication and stakeholder management skills.
-
Ability to work independently and make sound decisions in fast-paced environments.
Technical Knowledge
-
Understanding of server hardware, networking, storage, and infrastructure components.
-
Experience troubleshooting hardware and software issues in production environments.
-
Familiarity with incident management, change management, and operational support processes.
-
Ability to quickly understand and apply technical concepts within complex infrastructure environments.
Preferred Experience
-
Previous experience working in a data centre environment.
-
Hands-on server deployment, maintenance, or break-fix experience.
-
Exposure to cloud infrastructure, AI platforms, HPC environments, or GPU-based infrastructure.
-
Industry certifications related to infrastructure, networking, operations, or hardware support.
Personal Attributes
-
Highly organised, diligent, and detail-oriented.
-
Self-starter with a strong sense of ownership and accountability.
-
Curious and eager to learn new technologies and operational processes.
-
Calm under pressure and able to make sound decisions during incidents.
-
Strong customer service ethic and commitment to operational excellence.
-
Collaborative team player who can influence and build relationships across teams.
What we can offer you
At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core.
Highly competitive package (base + equity) with reviews every 12 months. 🚀
Join one of the fastest-growing AI infrastructure companies — your chance to work with cutting-edge GPU infrastructure and help power the future of AI. ✨
Expect a dynamic progression plan tailored to your ambitions. Grow by taking ownership, solving complex operational challenges, and expanding your technical expertise — always with our full support.
Learning & Development: Access to training resources, certifications, and opportunities to deepen your expertise across infrastructure operations and AI technologies.
Equal Opportunities Statement
We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.
If there’s anything we can do to accommodate your specific situation, please let us know.
The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.
Apply for this job
*
indicates a required field
