Skip to main content

AI Evaluation Made Easy

Discover how your AI performs with Zeno.
Explore your data, uncover failures, and create beautiful, interactive charts.


Trending Projects and Reports

Explore popular public projects and reports on Zeno Hub



Testimonials

Thanks to the help of the Zeno team, our investigations of Open LLM Leaderboard evaluations were much faster.

Clémentine, Research Scientist

Demystifying benchmarkings and evaluation practices, and communicating them to a wider audience effectively is very difficult. Zeno does a great job of encouraging taking a critical look at your metrics and your datasets, which is much needed in the field!

Hailey, Research Scientist


Example image from ImagenetteExample image from ImagenetteExample image from ImagenetteExample image from Imagenette

Screenshot of radar chart

Error Discovery

Discover where your models fail with automated error analysis.

Zeno includes advanced error discovery techniques such as slice finder to automatically surface your models' systematic failures.


Chart Building

Create beautiful, interactive visualizations to compare model performance across slices and models

Use Zeno's drag-and-drop interface to create interactive charts. Create a radar chart comparing multiple models on different slices of your data, or a beeswarm plot to compare hundreds of models.

Screenshot of radar chart
Screenshot of bar chartScreenshot of beeswarm chart

Report Authoring

Create reports with interactive visualizations and markdown

Visualizations can be combined with rich markdown text to share insights and tell stories about your data and model performance.

Reports can be authored collaboratively and published broadly.

Getting Started

Upload your results to Zeno in just a few lines of code


1pip install zeno-client
1from zeno_client import ZenoClient, ZenoMetric
2
3# Initialize a client with our API key.
4client = ZenoClient(YOUR_API_KEY)
5
6# Create a project.
7project = client.create_project(name="Writing Assistant", view="text-classification")
8
9# Upload the data.
10df = ... # Your DataFrame
11project.upload_dataset(df, id_column="id", data_column='text', label_column="label")
12system_df = ... # Your system outputs
13proj.upload_system(system_df, name="GPT-4", id_column="id", output_column="output")

Sponsors and Organizations

Mozilla logoNSF logoCMU logo