Back to jobs

Site Reliability Engineer

Tel Aviv-Yafo, Israel

Optimove is a global marketing tech company, recognized as a Leader by Forrester and a Challenger by Gartner. We work with some of the world's most exciting brands, such as Sephora, Staples, and Entain, who love our thought-provoking combination of art and science. With a strong product, a proven business, and the DNA of a vibrant, fast-growing startup, we're on the cusp of our next growth spurt. It's the perfect time to join our team of ~500 thinkers and doers across NYC, LDN, TLV, and other locations, where 2 of every 3 managers were promoted from within. Growing your career with Optimove is basically guaranteed. 

Are you passionate about ensuring system reliability, scalability, and performance? Do you thrive in a dynamic environment where automation and operational excellence are key?
Optimove is looking for a Site Reliability Engineer (SRE) to join our team and play a crucial role in designing, implementing, and maintaining our cloud-based infrastructure. In this role, you will collaborate across teams to drive automation, improve system resilience, and optimize performance while fostering a culture of reliability.

Responsibilities:

  • System Reliability – Ensure high availability and performance of services through effective monitoring, incident management, and root cause analysis.
  • Automation & Tooling – Develop and maintain automation for infrastructure provisioning, configuration management, and application deployment.
  • Performance Optimization – Analyze and enhance system performance, including load balancing, caching, and database tuning. Conduct regular capacity planning.
  • Incident Response & Troubleshooting – Lead incident response efforts, participate in on-call rotations, and troubleshoot complex infrastructure issues.
  • Security & Compliance – Collaborate with security teams to implement best practices and ensure compliance with relevant standards (ISO 27001, SOC 2, etc.).
  • Collaboration & Mentorship – Work closely with developers, DevOps, Support, and product teams to enhance application reliability and implement SRE best practices.

Requirements:

  • 5+ years in site reliability engineering, DevOps, or related roles.
  • Proven experience managing large-scale, cloud-based infrastructure in GCP, AWS, or Azure.
  • Expertise in container orchestration (Kubernetes, Docker) and microservices architecture.
  • Strong proficiency in scripting and programming languages (Python, Go, Bash, etc.).
  • Experience with CI/CD pipelines, infrastructure as code (Terraform, CloudFormation), and configuration management (Ansible, Puppet, Chef).
  • Hands-on experience with monitoring and observability tools (Datadog, Prometheus, Grafana, ELK Stack).
  • Deep understanding of networking concepts, DNS, load balancing, and distributed systems.
  • Strong problem-solving skills, excellent communication, and a proactive mindset.

Advantages:

  • Certifications – AWS Certified Solutions Architect, GCP Professional Cloud Architect, or Kubernetes certifications (CKA, CKAD).

Why Join Us?

In this role, you will have the opportunity to work on cutting-edge technology, solve challenging problems, and make a tangible impact on the reliability and scalability of our systems. Join a team that values collaboration, innovation, and continuous learning, and be part of an exciting journey as we scale our platform to new heights!

 

Apply for this job

*

indicates a required field

Resume/CV

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf