AI Evaluation Lead (JetBrains AI)
At JetBrains, code is our passion. Ever since we started back in 2000, we have been striving to make the world’s most robust and effective developer tools. By automating routine checks and corrections, our tools speed up production, freeing developers to grow, discover, and create.
The JetBrains AI team is focused on bringing advanced AI capabilities to JetBrains products, which includes supporting the internal AI platform used across JetBrains and conducting long-term R&D in AI and machine learning. We collaborate closely with product teams to brainstorm and prioritize AI-driven features, as well as support product marketing and release planning. Our team includes about 50 people working on everything from classical ML algorithms and code completion to agents, retrieval-augmented generation, and more.
We’re looking to strengthen our team with an AI Evaluation Lead who will help define and execute our strategy for evaluating AI-powered features and LLMs. In this role, you will be instrumental in ensuring our models deliver meaningful value to users, by shaping evaluation pipelines, influencing model development, collaborating with product and research teams across the company, and publishing your work to open source.
We value engineers who:
- Plan their work and make decisions independently, consulting with others if needed.
- Follow the latest advances in AI and ML fields, think long-term, and take ownership of their scope of work.
- Prefer simplicity, opting for sound, robust, and efficient solutions.
In this role, you will:
- Design and develop rigorous offline and online evaluation benchmarks for AI features and LLMs.
- Manage the team, prioritize tasks, and mentor teammates.
- Define evaluation methodology and benchmarks for our open-source models and public releases.
- Communicate your findings and best practices across the organization.
We’ll be happy to have you on our team if you have:
- Expertise in evaluating generative AI methods.
- A strong understanding of statistics and data analysis.
- Excellent management and communication skills.
- Solid practical experience with Python and evaluation frameworks.
- Attention to detail in everything you do.
We’d be especially thrilled if you have experience with:
- Preparing public evaluation reports for feature or model releases.
- Managing data annotation efforts, including crowdsourcing and in-house labeling.
- CI systems, workflow automation, and experiment tracking.
- The Kotlin programming language.
To develop JetBrains AI, we use:
- Weights & Biases and Langfuse for experiment tracking and reporting.
- ZenML for ML workflow automation.
- AWS and GCP for infrastructure.
- Git for source code management.
- TeamCity for continuous integration.
Create a Job Alert
Interested in building your career at JetBrains? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field

