Back to jobs
New

AI Engineer (AI Tests Generation Startup)

Limassol, Cyprus; Paphos, Cyprus

We’re a JetBrains-backed incubator startup building an AI-powered Chrome extension that turns manual web test journeys into clean, production-ready end-to-end test code right within your repo. We’re a small team, with real users and zero bureaucracy. Our processes are built for shipping value fast, not for the sake of ceremony.

Your role 

As part of our team, you’ll own the experimentation, evaluation pipeline, and metrics for our code generation agent. While our whole team contributes, you will be our expert on continuous evaluation and advanced LLM techniques.

Your focus in the first six months:

  • Refining our evaluation pipeline. You’ll audit our metrics, ensure they correlate with user value, and rebuild the system for a tighter feedback loop.
  • Shipping core agent improvements. You'll prototype, evaluate, and productionize features like agent planning, context reduction, and RAG.
  • Building deterministic feedback loops. You’ll guide our team’s experts in code analysis and test execution to create systems that allow agents to generate better tests.

Why you should join us 

  • We’re a small team that ships fast and skips bureaucracy. You get real ownership and quick feedback from users.
  • We’re backed by JetBrains, financially secure, and have no reliance on external VC funding.
  • We work on practical multi-agent LLM systems for web testing. This is a growing space with real problems to crack, where you’ll have the freedom to do your best work.

Who we’re looking for 

A Senior AI Engineer with hands-on experience in building, evaluating, and shipping LLM or ML systems. What matters most:

  • Engineering pragmatism. You cut unnecessary complexity to find the most effective solution – whether that’s a deterministic parser or a complex LLM agent.
  • A shipping mindset. You break down ambiguous problems into small, iterative experiments that deliver value to users, fast.
  • True ownership. You see your work through from the initial idea to production, constantly measuring its impact. "Done" means it's working for our users.

Required experience 

Tech stack: Python, LangGraph, LangFuse, and PydanticAI (ongoing experiment).

  • You have built and shipped at least one significant LLM-powered feature, owning it from initial concept to production users.
  • You've designed and built evaluation pipelines for ML systems, defining metrics that measure real user value and using them to drive improvements.
  • Your engineering skills go beyond models and notebooks; you have practical experience writing production-quality Python for things like data pipelines, backends, or internal tools.
  • At least one full year of experience in a startup or fast-moving team.

Nice-to-haves 

We’d be especially excited if you have experience with any of the following (side projects count!):

  • Building LLM agents with tool-calling, planning, or multi-step reasoning.
  • Code generation benchmarks like SWE-bench or HumanEval.
  • Web UI automation frameworks like Selenium or Playwright (web-based end-to-end testing, web parsing, etc.).
  • Static program analysis tasks, such as parsing code and working with ASTs.

Create a Job Alert

Interested in building your career at JetBrains? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...
Select...

A lot of teams at JetBrains are distributed across our core locations. If you are not based in one of the locations below, we are happy to support you with relocation - please discuss this in the initial conversation with your recruiter, and they will find a way to support you best.