Senior Cloud Native Platform Engineer
About Nscale
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale enables AI-focused companies to achieve superior results by reducing the complexity of AI development. Our GPU cloud bolsters technical capabilities and directly supports strategic business outcomes, including cost management, rapid innovation, and environmental responsibility.
We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency, where everyone is inspired to do their best work. If you join our team, you’ll be contributing to building the technology that powers the future.
About the Role
We’re hiring a Senior Cloud Native Platform Engineer to build, operate, and improve the cloud-native platform foundations that support AI applications and services at scale.
In this hands-on platform engineering role, you’ll work on shared Kubernetes-based platforms, deployment patterns, observability foundations, infrastructure automation, and operational tooling that help internal teams run services safely and efficiently on GPU-backed infrastructure. You’ll partner closely with software engineering, infrastructure, and SRE teams to ensure platform capabilities meet real developer and operational needs.
This role is important to the reliability, scalability, and usability of Nscale’s platform. You’ll take ownership of significant platform components, deliver complex technical work independently, and raise the quality of operations and engineering through practical improvements, sound technical judgement, and mentoring.
What you'll be doing
Platform Operations & Engineering
- Build and improve shared cloud-native platform capabilities used by internal engineering teams to run AI applications and services.
- Own significant parts of the platform area, including Kubernetes cluster operations, workload runtime configuration, deployment workflows, observability foundations, or environment automation.
- Improve the reliability, scalability, and supportability of platform services through practical engineering and operational enhancements.
- Develop automation, tooling, and configuration that reduce manual effort, improve consistency, and make the platform easier to use and operate.
- Apply software engineering where it creates leverage, including scripts, services, CI/CD automation, operational tooling, and platform integrations.
Reliability, Operability & Automation
- Improve incident prevention, detection, response, and recovery across the platform areas you support.
- Build and refine observability for platform services, including metrics, logs, tracing, dashboards, alerts, and other useful operational signals.
- Strengthen rollout safety, capacity awareness, failure handling, and recovery procedures for production environments.
- Debug and resolve complex issues spanning Kubernetes, Linux, networking, storage, workload runtime behaviour, and cloud or datacentre infrastructure dependencies.
- Enhance operational playbooks, runbooks, and engineering practices to reduce toil and increase service resilience.
Team Technical Contribution
- Contribute to design discussions, code reviews, and operational standards within the platform engineering team.
- Collaborate with software engineering, infrastructure, and SRE teams to deliver platform capabilities that are practical, supportable, and aligned to operational needs.
- Define sensible defaults, paved roads, and supportable patterns for service deployment and runtime operations.
- Mentor less experienced engineers in platform engineering fundamentals, operational judgement, and good automation practices.
KPIs
- Platform reliability and service resilience
- Reduction in manual operational toil
- Incident detection, response, and recovery effectiveness
- Observability and operational readiness of platform services
About You
- Strong hands-on experience operating and improving Kubernetes-based platforms in production.
- Solid experience with infrastructure automation, CI/CD, configuration management, or GitOps-style workflows.
- Strong understanding of reliability engineering principles, including observability, incident response, failure analysis, and operational readiness.
- Experience writing production-quality automation, tooling, or backend code in Go, Python, Bash, or similar languages.
- Good Linux fundamentals, including processes, filesystems, cgroups, service behaviour, and system debugging.
- Good networking fundamentals, including TCP/IP, DNS, routing, load balancing, and container or overlay networking concepts.
- Experience debugging complex production issues across multiple system layers.
- Ability to work independently on substantial technical problems while collaborating effectively with adjacent teams.
- Experience mentoring or supporting less experienced engineers through practical technical guidance.
What we can offer you
At Nscale, you'll find a collaborative, supportive, and innovative environment where your contributions spark real impact. We're building something extraordinary, and we want you at the core.
- Highly competitive US compensation package (base + bonus + equity), with performance reviews every 12 months. 🚀
- Join one of the fastest-growing AI infrastructure companies — your chance to directly shape how global AI capacity is planned and deployed. ✨
- Expect a dynamic progression plan tailored to your ambitions. Grow by leading critical cross-functional initiatives and shaping capital strategy — always with our full support.
- Human-First Flexibility: We treat you as humans first. 🫶🏽 Our flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.
Equal Opportunities Statement
We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.
If there’s anything we can do to accommodate your specific situation, please let us know.
The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.
For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.
Salary Range
The range below reflects the base salary for the position. Actual compensation may vary based on job-related factors such as skill set, experience, education, and location. In addition to base salary, this role may be eligible for bonus, equity, and/or commission programs. Nscale may offer a competitive benefits package including medical, dental, vision, flexible paid time off, parental leave, and retirement plan part
The range below reflects the base salary for the position. Actual compensation may vary based on job-related factors such as skill set, experience, education, and location. In addition to base salary, this role may be eligible for bonus, equity, and/or commission programs. Nscale may offer a competitive benefits package including medical, dental, vision, flexible paid time off, parental leave, and retirement plan participation.
Salary Range
$200,000 - $225,000 USD
For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.
Apply for this job
*
indicates a required field
