Back to jobs
New

Principal Observability Platform Engineer

US

Principal Observability Platform Engineer – Nscale

About Nscale

Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale simplifies AI development while enabling superior results, supporting strategic business outcomes such as cost management, rapid innovation, and environmental responsibility.

We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you’ll build trust through openness and transparency while contributing to the technology that powers the future.

About the Role

Nscale is seeking a Principal Observability Platform Engineer to join our Global CISO Organization. This role is critical in designing, implementing, and scaling observability and security solutions for our GPU cloud infrastructure. You will drive platform security and operational excellence across compute, networking, storage, and control plane systems.

What You’ll Be Doing

  • Lead security and observability engineering initiatives across distributed, multi-tenant infrastructure.

  • Identify architectural and systemic risks, and design solutions that are scalable and resilient.

  • Harden Kubernetes, virtualization layers, GPU workloads, and platform services.

  • Strengthen identity, authentication, authorization, and secrets management systems.

  • Partner with Networking teams on secure segmentation and traffic isolation strategies.

  • Embed automated security validation and guardrails into CI/CD pipelines.

  • Conduct deep technical design reviews and threat modeling exercises.

  • Mentor and develop junior engineers, raising the technical bar across the team.

  • Partner with the CISO to shape long-term platform security and observability strategy.

  • Represent Nscale externally as a subject matter expert in infrastructure, observability, and cloud security.


About You

Required:

  • 10+ years of hands-on security or observability engineering experience in cloud, hyperscale, or large distributed systems.

  • Strong software engineering skills (Go, Python, Rust, or similar).

  • Deep expertise in:

    • Linux systems internals

    • Kubernetes and container security

    • Infrastructure-as-Code (Terraform or equivalent)

    • Cloud-native architectures

    • Network security and segmentation

    • Identity and access management

  • Proven experience securing multi-tenant environments at scale.

Nice to Have:

  • Experience building observability platforms and telemetry pipelines.

  • Familiarity with GPU cloud infrastructure or AI workloads.

  • Exposure to distributed tracing, metrics, and log aggregation tools.

What We Can Offer You

  • Highly competitive package (base + equity) with annual reviews. 

  • Join one of the fastest-growing AI infrastructure startups—push boundaries, collaborate with brilliant minds, and make an outsized impact. ✨

  • Dynamic growth and career progression, tailored to your ambitions.

  • Human-first flexibility with remote-first work; autonomy to shape your day around life’s moments.

  • Collaborative, supportive, and innovative environment where your contributions spark real impact.

Equal Opportunities Statement

We strongly encourage applications from people of color, the LGBTQ+ community, people with disabilities, neurodivergent individuals, parents, carers, and people from lower socio-economic backgrounds.

If there’s anything we can do to accommodate your specific situation, please let us know.

Note: Responsibilities outlined are not exhaustive and may evolve as business needs change.

 

For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: Here.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...