Skip to main content

Getting Started

Zeno is an interactive AI evaluation platform for exploring, discovering and reporting the performance of your models. Use it for any task and data type with Zeno's modular views for everything from object detection to audio transcription and chatbot conversations. Zeno helps you move beyond aggregate metrics and spot-checking model outputs to develop a deep and quantitative understanding of how your model behaves.

To get an idea of Zeno's features, you can explore public projects and reports in Zeno Hub. For example, take a look at a report evaluating how well LLMs like GPT-4 do on language translation, along with the underlying data used to create the report..

tip

Head over to Zeno Hub to explore public Zeno projects and reports.

Check out the Using Zeno docs for more information on how to use the Zeno interface.

Creating a Project

To create your own projects and reports, first create an account on Zeno Hub. After logging in to Zeno Hub, generate your API key by clicking on your profile at the bottom left to navigate to your account page.

Next, install the zeno-client Python package, which is used to upload new datasets and AI system outputs:

pip install zeno-client

We can now create a client with out API key, and use it to create a project and upload data. The client API works with Pandas DataFrames, so we can create a sample DataFrame looking at text sentiment classification:

from zeno_client import ZenoClient
import pandas as pd

client = ZenoClient("YOUR API KEY HERE")

df = pd.DataFrame(
{
"text": [
"I love this movie!",
"I hate this movie!",
"This movie is ok.",
],
"label": ["positive", "negative", "neutral"],
}
)

# Explicitly save the index as a column to upload.
df["id"] = df.index

# Add any additional columns you want to do analysis across.
df["input length"] = df["text"].str.len()

Let's create a project for this task. Projects in Zeno are a base dataset and any number of AI system outputs, and are used to evaluate and compare model performance. Here we create a project and upload our base dataset.

info

The view option can take a string for one of the standard views or a dict with a custom view specification.

If you just want to look at tabular metadata, you can omit view from create_project and data_column from upload_dataset.

from zeno_client import ZenoClient, ZenoMetric

...

project = client.create_project(
name="Sentiment Classification",
view="text-classification",
metrics=[
ZenoMetric(name="accuracy", type="mean", columns=["correct"]),
]
)

project.upload_dataset(df, id_column="id", data_column='text', label_column="label")

We named our project "Sentiment Classification" and specified that it is a text classification task. Check out all supported data types and tasks here. We also added an initial accuracy metric which takes the mean of the correct column, which will be present in the system outputs we upload later.

Next, we can upload some system outputs to evaluate. Here we'll upload some fake predictions from a model:


...

df_system = pd.DataFrame(
{
"output": ["positive", "negative", "negative"],
}
)

# Create an id column to match the base dataset.
df_system["id"] = df_system.index

# Measure accuracy for each instance, which is averaged by the ZenoMetric above.
df_system["correct"] = (df_system["output"] == df["label"]).astype(int)

proj.upload_system(df_system, name="System A", id_column="id", output_column="output")

You can now navigate to the project URL in Zeno Hub to see the uploaded data and metrics and start exploring your AI system's performance!

Complete Example

from zeno_client import ZenoClient, ZenoMetric
import pandas as pd

client = ZenoClient("YOUR API KEY HERE")

df = pd.DataFrame(
{
"text": [
"I love this movie!",
"I hate this movie!",
"This movie is ok.",
],
"label": ["positive", "negative", "neutral"],
}
)

# Explicitly save the index as a column to upload.
df["id"] = df.index

# Add any additional columns you want to do analysis across.
df["input length"] = df["text"].str.len()

project = client.create_project(
name="Sentiment Classification",
view="text-classification",
metrics=[
ZenoMetric(name="accuracy", type="mean", columns=["correct"]),
]
)

project.upload_dataset(df, id_column="id", data_column='text', label_column="label")

df_system = pd.DataFrame(
{
"output": ["positive", "negative", "negative"],
}
)

# Create an id column to match the base dataset.
df_system["id"] = df_system.index

# Measure accuracy for each instance, which is averaged by the ZenoMetric above.
df_system["correct"] = (df_system["output"] == df["label"]).astype(int)

project.upload_system(df_system, name="System A", id_column="id", output_column="output")

Quickstart with Zeno Build

Zeno Build is a Python library that makes it easy to set up Zeno projects for common AI and ML tasks. Check out some common Zeno Build notebooks: