Back to jobs
New

Lead MLOps Engineer

Bangladesh, South Asia

Lead MLOps Engineer

Location: Bangladesh, South Asia (Remote)

Department: Software Engineering

ABOUT NEXGEN CLOUD: 

NexGen Cloud is the company behind Hyperstack, a full-stack AI cloud serving tens of thousands of customers from AI researchers to enterprises running the world's most compute-intensive workloads. We deliver on-demand and private GPU infrastructure to teams who treat performance as a requirement, not a feature.

We're a tight-knit, fast-moving team working at the cutting edge of AI cloud infrastructure. We practice what we preach, equipping our people with AI at every level so we can solve harder problems, ship faster, and keep raising the bar for what enterprise GPU infrastructure looks like.

THE ROLE: Lead MLOps Engineer

This role exists because Hyperstack is scaling its AI cloud platform and building out the infrastructure that powers production ML workloads for thousands of customers. As AI Studio capabilities grow and the platform takes on increasingly complex training, fine-tuning, and inference workloads, we need someone to own the MLOps layer — the systems, tooling, and practices that make large-scale AI workloads reliable, observable, and repeatable in production. You’ll have direct ownership over ML platform reliability, deployment workflow engineering, and the operational standards that underpin how AI workloads run on Hyperstack — end to end.

Role positioning:

This is a lead individual contributor role. You’ll set the technical direction for MLOps on the platform, work directly with Product and Engineering, and take end-to-end ownership of the systems that make AI workloads run in production. No hand-holding, lots of impact.

WHAT YOU’LL BE DOING

Rather than a long checklist, here’s what success in this role looks like:

  • Own the design, implementation, and evolution of core MLOps systems across Hyperstack — including the infrastructure and workflows that underpin AI Studio
  • Build and improve systems that orchestrate model training, fine-tuning, evaluation, and deployment — engineered for long-running, resource-intensive, GPU workloads
  • Own production readiness across ML infrastructure — monitoring, alerting, incident response, and continuous improvement based on real-world usage
  • Define and embed strong MLOps practices across teams — model versioning, reproducibility, deployment safety, rollback strategies, and environment management
  • Provide technical leadership through architecture decisions, implementation guidance, and shared standards — working closely with Product, Engineering, and cross-functional teams

ABOUT YOU:

We’re more interested in how you think and work than in a perfect CV. You’ll likely bring a combination of the following:

Essential

  • Proven experience designing, building, and operating production ML infrastructure, platform systems, or MLOps workflows in cloud environments
    • Hands-on Python development skills, with experience building backend systems, automation, and developer or platform tooling
    • Experience supporting LLM, generative AI, or fine-tuning workflows in production — including training, evaluation, deployment, inference, and lifecycle management
    • Production-grade experience with Docker, Kubernetes, CI/CD, and infrastructure-as-code in real, operational environments
    • Experience owning complex, asynchronous, or resource-intensive workloads end to end — including orchestration, reliability, observability, and incident response
    • Ability to work cross-functionally and provide technical leadership through influence — shaping standards, direction, and ways of working across engineering teams

Nice to Have

  • Exposure to GPU-intensive, distributed, or performance-sensitive ML workloads
  • Experience building internal developer platforms or tooling that improve experimentation, reproducibility, and delivery speed for ML teams
  • Background in cloud infrastructure, platform products, or technically complex B2B software

WHAT WE OFFER 

  • Competitive salary and annual discretionary bonus scheme
  • Employee wellbeing benefits
  • 25 days of holiday, plus public holidays
  • Flexible working arrangements (remote or hybrid, depending on role and location)
  • Real ownership and autonomy, with the trust to take initiative and experiment
  • The opportunity to make a visible, meaningful impact as we scale
  • Clear career progression and growth opportunities in a fast-growing company
  • A collaborative, international culture built on trust, transparency, and ownership
  • The chance to help shape NexGen Cloud’s team, culture, and future alongside ambitious, mission-driven colleagues

MORE INFORMATION

Head over to our NexGen Cloud careers page to view current opening and follow us on LinkedIn and X to learn more about our journey, newest releases and hear exciting news in the neocloud space.

Create a Job Alert

Interested in building your career at NexGen Cloud? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf


Select...
Select...
Select...
Select...
Select...
GDPR Notice *

We take your privacy seriously. The information you provide in your application will be used only for recruitment purposes and processed in line with the General Data Protection Regulation (GDPR).

By submitting my application, I agree that NexGen Cloud will process my personal data for recruitment purposes, in line with the GDPR. This includes reviewing my application, contacting me, and, if applicable, progressing my candidacy. My data will be retained for up to 12 months unless I request otherwise. I understand I can withdraw consent or exercise my rights (access, rectification, erasure, objection) at any time by contacting careers@nexgencloud.com. Full details are available in our Privacy Policy - https://www.nexgencloud.com/privacy-policy