Senior DevOps/MLOps Engineer - Digital Health
Irembo is a technology company that designs and develops digital products to ease the accessibility of services in users’ everyday lives worldwide, starting with Rwanda. Our pioneer products, IremboGov and IremboPay, have enabled Rwandan citizens and foreigners to access and pay for over 150 public services online through our one-stop-shop e-governance and payment platforms. To date, we have facilitated over 30 million transactions through our platforms and have ambitious goals to scale our technology worldwide to enable more governments and institutions to serve their citizens better. More information is available on irembo.com.
Location: Kigali, Rwanda.
Duration: 24 months.
Terms of Reference - Senior DevOps/MLOps Engineer
The Opportunity
Irembo is transforming the healthcare landscape by launching a national-scale telemedicine platform, building on our success in service management (IremboGov) and payment solutions (IremboPay). This project directly leverages the seven-year legacy of Babyl Rwanda, which pioneered telemedicine and delivered over 3.5 million consultations.
We are seeking a highly specialized DevOps/MLOps Engineer to design, implement, and manage the critical, resilient on-premise infrastructure for the Irembo TeleClinic platform. Your primary mission is to ensure high availability, security, and performance for a hybrid workload, specifically supporting cutting-edge AI workloads for improved diagnosis and personalized care, while handling massive user traffic across multiple channels (Web, Mobile, USSD/Voice legacy).
You will build the robust infrastructure needed to securely run high-impact digital health services within a national data environment. You will sit at the intersection of Telecommunications, High-Performance Computing, and Healthcare Compliance.
Key Responsibilities:
- Core Infrastructure Management and Operations:
- Infrastructure Management (On-Premise): Co-design, Set up, maintain, and upgrade on-premise infrastructure (compute, storage, network) to reliably support Digital Health traffic and ensure high availability for clinical services.
- CI/CD Automation: Design, implement, and maintain robust, automated CI/CD pipelines for microservices (backend) and mobile/web applications, ensuring rapid, safe, and reliable feature deployment.
- Edge/API Layer Optimization: Engineer the API layer for environments with unstable networks (3G/4G). This includes implementing high-efficiency binary protocols (e.g., gRPC/Protobuf) and aggressive edge caching strategies to minimize bandwidth consumption for citizens, moving beyond standard REST/JSON architectures.
- Immutable Audit and Compliance Logging: Establish a centralized, tamper-proof logging architecture that correlates all infrastructure events with AI decisions, ensuring full traceability for medical audits and regulatory compliance.
- Machine Learning Operations:
- Provision, configure, and manage dedicated on-premise GPU clusters optimized for low-latency AI model serving, real-time triage, and advanced diagnostic engines.
- Collaborate with the data team to design and secure efficient data pipelines that feed high-quality, clinical data to the training and inference environments.
- Manage MLOps tools (e.g., MLflow, Kubeflow, KServe) or comparable alternatives to streamline the lifecycle of AI models, including tracking, versioning, testing, and serving models in high-volume production environments.
- Continually optimize AI serving infrastructure for cost, latency, and throughput, essential for improving diagnosis and personalizing care at a national scale.
- Security and Compliance:
- Implement stringent security controls and compliance checks, adhering to national health data regulations and international best practices for data protection and security audits.
- Establish and regularly test comprehensive disaster recovery and backup strategies for all patient data and core service components to ensure business continuity for critical healthcare services.
Qualifications:
Required Skills & Experience
- 4+ Years of Experience in a dedicated DevOps, SRE, or Platform Engineering role.
- Low-Latency Protocol Mastery (gRPC/Protobuf): Deep experience in the design, setup, maintenance, and troubleshooting of gRPC and Protobuf for mobile and web-application communication. A critical understanding of HTTP/2 multi- and demultiplexing, and strategies for minimizing bandwidth usage on unstable 3G/4G networks, is required.
- Experience with the telecom protocols and technologies (SMPP, USSD Gateways, and SIP) for delivering services via USSD and Voice/IVR channels.
- MLOps Production Deployment: Deep-Dive Experience with MLOps toolchains (e.g., MLflow, Kubeflow, KServe) or comparable alternatives for successfully deploying and serving machine learning models in high-volume production environments.
- Mandatory experience with on-premises infrastructure management.
- Deep AI Observability: Proven ability to design and operate full-stack monitoring solutions from scratch, moving beyond simple "uptime checks" to complex SLO/SLI (Service Level Objective) definitions for AI workloads.
- RAG System Scaling: Proficiency in scaling Vector Databases and building robust data ingestion pipelines for Retrieval-Augmented Generation (RAG) systems.
- Deep Linux System Mastery: Proven ability in Linux kernel tuning, networking stack optimization, and storage performance management.
- Container and Orchestration Proficiency: Strong, mandatory knowledge of Kubernetes and Docker.
Preferred Skills
- Experience with setting up and optimizing GPU clusters for inference workloads.
- Security-First Monitoring: Experience implementing "Privacy-Preserving Telemetry"- Ensuring that logs and traces never accidentally capture PII or PHI.
- Certifications: Relevant certifications (e.g., CKS, NCA-AIIO, HCISPP) are a strong plus.
Why Join This Project?
This is a unique opportunity to apply cutting-edge DevOps and MLOps practices to a project with a profound social impact. You will not only manage the infrastructure but also be a critical force in expanding access and convenience for efficient, high-quality digital health services for every Rwandan. You will directly build the resilient foundation for running cutting-edge AI services securely within a national data environment.
Please note that the salary for this position is commensurate with experience and qualifications and will be discussed during the interview process.
Application Deadline
- January 9, 2026
We are an equal opportunity employer and are committed to providing a positive interview experience for every candidate. We're on a mission to change our continent through technology and are committed to a diverse and inclusive workplace and strongly encourage applicants from all backgrounds, nationalities, and walks of life.
Our head office is based in Kigali, Rwanda.
Create a Job Alert
Interested in building your career at Irembo Ltd? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field