Back to jobs
New

Machine Learning Evaluation Engineer (Agentic Mobile App Generator)

Amsterdam, Netherlands; Berlin, Germany; Limassol, Cyprus; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Warsaw, Poland; Yerevan, Armenia

Are you passionate about high-quality AI developer tools? We’re looking for a Machine Learning (ML) Evaluation Engineer to join our agentic mobile app generator project. This role focuses on building evaluation pipelines for AI code generation within the Compose Multiplatform ecosystem.

About the team and role 

Our internal accelerator team is developing AI agents to generate Kotlin Multiplatform apps with fully working navigation, data persistence, and access to remote data sources. As an ML Evaluation Engineer, you will ensure our AI features work correctly and facilitate continuous improvement of the generated apps. If you’re excited about improving the quality of AI-generated code, skilled in building evaluation systems, and interested in new projects, this role is for you.

In this role, you will: 

  • Design, build, and maintain evaluation pipelines.
  • Work with AI engineers, mobile experts, and product designers to establish quality standards and test plans.
  • Develop ways to assess the completeness, security, and performance of generated apps.
  • Analyze evaluation results, identify areas to improve code generation, and give feedback to the development team.
  • Develop and run test plans, test cases, and automated tests.
  • Improve our testing methods and QA processes in an agile environment.
  • Join discussions, design reviews, and brainstorming sessions around AI tools.

We'd like you to join the team if you: 

  • Have at least three years of QA or evaluation engineering experience in commercial software, with a strong background in testing complex systems.
  • Have experience or interest in building and maintaining evaluation pipelines for AI-assisted development, prompt engineering, or ML-based code generation.
  • Have experience with data analysis tools or ML experiment tracking platforms.
  • Understand software testing methods, including functional, performance, and integration testing.
  • Are proficient in Python, Kotlin, Java, Swift, or similar languages.
  • Work well in distributed, cross-functional teams.
  • Have strong English communication skills and can explain complex ideas clearly.

We’ll be especially thrilled if you: 

  • Are always looking for ways to improve developer workflows and productivity.
  • Have experience with low-code and/or no-code platforms or tools.
  • Are familiar with Compose Multiplatform, building KMP libraries or frameworks, or IDE and plugin development.

Why work at JetBrains? 

  • Impactful work: Directly influence how future mobile applications are built and tested globally by millions of developers.
  • Innovative culture: Work in an environment that values innovation, creativity, open communication, and respect.
  • Cutting-edge tech: Use new technology stacks with minimal bureaucracy, and focus on developing ideas.
  • Professional growth: Grow as an evaluation engineer through mentorship, teamwork, and learning about AI research and trends.
  • Work-life balance: Enjoy flexible work and a good work-life balance in a developer-focused culture.

Want to help shape AI-driven development? 

Apply now for JetBrains' new agentic mobile app generator project. Tell us about yourself and why you want to build the next generation of agentic AI. We look forward to hearing from you!

Create a Job Alert

Interested in building your career at JetBrains? Get future opportunities sent straight to your email.

Apply for this job

*

indicates a required field

Phone
Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf


Select...