Skip to main content

Evaluation for Modern AI

Zeno is a machine learning evaluation framework for exploring your data, debugging foundational models, and tracking and comparing model performance.


Explore data and model outputs with customizable views for any data type


Interactively discover, test and save model behavior for analysis and updates


Create exportable visualizations and charts comparing models and slices

Explore your data

Zeno's modular instance view can be extended to render any data type and model output

Image Classification
Audio Transcription
Activity Recognition
Your custom data type

Create interactive reports

Track and compare performance across slices and models

Slices created in the Exploration page can be used to build interactive visualizations for deeper analyses of model behavior. Visualizations include bar charts for comparing slice performance across models and trend tables for detecting regressions in slice performance.

Zeno charts can be exported as PDFs or PNGs for sharing with other stakeholders, or shared as links for live views of model performance.

Extend Zeno with the Python API

Add new models, metrics, and metadata columns with the Python API

The Python API is used to add models, metrics, and new metadata columns to Zeno.

The @model functions wraps Python libraries such as PyTorch, Tensorflow, Keras, HuggingFace, etc. to get model predictions. @metric functions are used to calculate different metrics on slices of data. @distill functions derive new metadata columns from your data instances.

Audio transcription using the OpenAI Whisper model
def load_model(model_path):
model = whisper.load_model("tiny")

def pred(df, ops: ZenoOptions):
# Get a list of paths for each audio file
files = [os.path.join(ops.data_path, f) for f in df[ops.data_column]]
return [model.transcribe(f)["text"] for f in files]

return pred