Job Application for Lead MLOps Engineer at NexGen Cloud

Lead MLOps Engineer

Location: Bangladesh, South Asia (Remote)

Department: Software Engineering

Reporting to: Engineering Lead

ABOUT NEXGEN CLOUD:

NexGen Cloud is the company behind Hyperstack, a full-stack AI cloud serving tens of thousands of customers from AI researchers to enterprises running the world's most compute-intensive workloads. We deliver on-demand and private GPU infrastructure to teams who treat performance as a requirement, not a feature.

We're a tight-knit, fast-moving team working at the cutting edge of AI cloud infrastructure. We practice what we preach, equipping our people with AI at every level so we can solve harder problems, ship faster, and keep raising the bar for what enterprise GPU infrastructure looks like.

THE ROLE: Lead MLOps Engineer

This role exists because Hyperstack is scaling its AI cloud platform and building out the infrastructure that powers production ML workloads for thousands of customers. As AI Studio capabilities grow and the platform takes on increasingly complex training, fine-tuning, and inference workloads, we need someone to own the MLOps layer — the systems, tooling, and practices that make large-scale AI workloads reliable, observable, and repeatable in production. You'll have direct ownership over ML platform reliability, deployment workflow engineering, and the operational standards that underpin how AI workloads run on Hyperstack end to end.

This is a lead individual contributor role. You'll set the technical direction for MLOps on the platform, work directly with Product and Engineering, and take end-to-end ownership of the systems that make AI workloads run in production.

WHAT YOU'LL BE DOING:

Rather than a long checklist, here's what success in this role looks like:

Own the design, implementation, and evolution of core MLOps systems across Hyperstack — including the infrastructure and workflows that underpin AI Studio
Build and improve systems that orchestrate model training, fine-tuning, evaluation, and deployment — engineered for long-running, resource-intensive GPU workloads
Own production readiness across ML infrastructure — monitoring, alerting, incident response, and continuous improvement based on real-world usage
Define and embed strong MLOps practices across teams — model versioning, reproducibility, deployment safety, rollback strategies, and environment management
Provide technical leadership through architecture decisions, implementation guidance, and shared standards — working closely with Product, Engineering, and cross-functional teams

ABOUT YOU:

We're more interested in how you think and work than in a perfect CV. You'll likely bring a combination of the following:

Essential

Proven experience designing, building, and operating production ML infrastructure, platform systems, or MLOps workflows in cloud environments
Hands-on Python development skills, with experience building backend systems, automation, and developer or platform tooling
Experience supporting LLM, generative AI, or fine-tuning workflows in production — including training, evaluation, deployment, inference, and lifecycle management
Production-grade experience with Docker, Kubernetes, CI/CD, and infrastructure-as-code in real, operational environments
Experience owning complex, asynchronous, or resource-intensive workloads end to end — including orchestration, reliability, observability, and incident response
Ability to work cross-functionally and provide technical leadership through influence — shaping standards, direction, and ways of working across engineering teams

Nice to Have

Exposure to GPU-intensive, distributed, or performance-sensitive ML workloads
Experience building internal developer platforms or tooling that improve experimentation, reproducibility, and delivery speed for ML teams
Background in cloud infrastructure, platform products, or technically complex B2B software

WHAT WE OFFER:

Competitive salary and annual discretionary bonus scheme
Employee wellbeing benefits
25 days of holiday, plus public holidays
Flexible working arrangements (remote or hybrid, depending on role and location)
Real ownership and autonomy, with the trust to take initiative and experiment
The opportunity to make a visible, meaningful impact as we scale
Clear career progression and growth opportunities in a fast-growing company
A collaborative, international culture built on trust, transparency, and ownership
The chance to help shape NexGen Cloud's team, culture, and future alongside ambitious, mission-driven colleagues

MORE INFORMATION

Head over to our NexGen Cloud careers page to view current openings and follow us on LinkedIn and X to learn more about our journey, newest releases and hear exciting news in the neocloud space.

Create a Job Alert

Interested in building your career at NexGen Cloud? Get future opportunities sent straight to your email.

First Name

Last Name

Preferred First Name

Country

Phone

Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

LinkedIn Profile

This role is based remotely in Bangladesh. Are you able to work from home?

Select...

Do you have the legal right to work in Bangladesh without requiring sponsorship now or in the future?

Select...

You're on-call and a model serving endpoint starts returning degraded outputs — latency spikes, but no hard errors. What's your first move?

Select...

Your team wants to accelerate LLM fine-tuning iterations. Which approach best balances speed with production readiness?

Select...

Product wants to ship an AI Studio feature that requires a new model deployment pattern your MLOps infrastructure doesn't yet support. How do you respond?

Select...

What excites you most about this role at NexGen Cloud?

GDPR Notice *

Accept GDPR Notice

We take your privacy seriously. The information you provide in your application will be used only for recruitment purposes and processed in line with the General Data Protection Regulation (GDPR).

By submitting my application, I agree that NexGen Cloud will process my personal data for recruitment purposes, in line with the GDPR. This includes reviewing my application, contacting me, and, if applicable, progressing my candidacy. My data will be retained for up to 12 months unless I request otherwise. I understand I can withdraw consent or exercise my rights (access, rectification, erasure, objection) at any time by contacting careers@nexgencloud.com. Full details are available in our Privacy Policy - https://www.nexgencloud.com/privacy-policy

What are your base salary expectations for this role?

What is your notice period / availability?

Lead MLOps Engineer

ABOUT NEXGEN CLOUD:

THE ROLE: Lead MLOps Engineer

WHAT YOU'LL BE DOING:

ABOUT YOU:

Essential

Nice to Have

WHAT WE OFFER:

MORE INFORMATION

Apply for this job