Back to jobs

MLOps Engineer - Data Platform Team

Tel Aviv

Surf AI is the agentic operations platform for enterprise security teams. We don't just surface risk, we close it. Our platform connects context across identity, cloud, HR, IT, and SaaS systems, and uses specialized AI agents to drive remediation end-to-end, with human oversight at every step.

We're backed by Accel, Cyberstarts, and Boldstart Ventures, and trusted by Fortune 500 enterprises already deploying Surf in production.

Our team is small and senior, with deep roots in identity, security, and enterprise infrastructure. We work at the intersection of agentic AI and applied security - and we take seriously what it means to build systems that act in real enterprise environments.

Who are we looking for?

As an MLOps Engineer, you will architect and operate scalable, production-grade ML systems. You’ll build cloud-native infrastructure that supports reproducible experimentation, reliable deployment, and continuous monitoring.
You will also help shape the company’s ML architecture, contributing to foundational design decisions and long-term platform strategy.

What you'll do

  • Build and operate end-to-end ML pipelines across training, validation, deployment, and monitoring.
  • Design reproducible training workflows with experiment tracking and dataset versioning.
  • Ensure data and feature consistency between training and production using feature stores and versioned data pipelines.
  • Implement CI/CD for ML systems, including automated testing, evaluation gates, model packaging, and staged promotion.
  • Manage model versioning, registry workflows, and artifact management.
  • Deploy and operate models using scalable inference patterns (e.g., batch, real-time, async).
  • Monitor model performance, data drift, concept drift, and infrastructure health with automated alerting and retraining triggers.
  • Optimize ML infrastructure for performance, cost efficiency, and scalability.
  • Collaborate with ML engineers to productionize research models into robust, observable systems.

Required Skills & Experience

  • 3+ years building and operating distributed systems in production.
  • 2+ years hands-on MLOps experience across the full model lifecycle.
  • Strong experience with Kubernetes in production (deployments, autoscaling, observability, troubleshooting).
  • Experience building CI/CD pipelines (GitOps preferred) for ML or data systems.
    Solid understanding of model lifecycle management, reproducibility, and artifact/version control.
  • Experience with monitoring stacks (e.g., Prometheus/Grafana, OpenTelemetry) and ML observability.
  • Experience with cloud platforms (AWS/GCP/Azure) and infrastructure-as-code (e.g., Terraform).

Nice to Have

  • Experience with Databricks or similar ML platforms.
  • Experience with orchestration frameworks such as Dagster or Airflow.
  • Experience with feature stores and model registries (e.g., MLflow).
  • BSc or MSc in Computer Science, Mathematics, or Engineering.

Why Join Us?
If you want to work on foundational systems, ship AI into production, and help define how agentic security actually operates, this is an opportunity to do it early and with real ownership.

Create a Job Alert

Interested in building your career at Surf AI? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf


Select...