Zeno is an interactive platform for AI evaluation. With Zeno, you can discover, explore, and analyze the performance of your models across diverse use cases. Zeno can be used for any data type or task with modular views for everything from object detection to audio transcription and chatbot conversations.
See the Quickstart to set up Zeno and learn about the core concepts.
Explore Model Performance
Zeno's Exploration UI is the main interface for evaluation. It lets you slice and dice your data to quickly see how your model works for different types of instances. You can pick different models and metrics to see model performance.
The modular instance view supports any data type or task. Existing views can be found here, and new views can be written for unsupported data types.
Interactive, Shareable Reports
Zeno empowers you to create interactive visualizations and reports for summarizing and sharing insights of model performance. Zeno charts are live-updated with your data and model outputs, keeping a fresh view of model performance. Interactive reports can be shared publicly, exported as PDFs, and easily reproduced by others.
The Python API consists of four core decorator functions that you can use to plug your models in and generate information for evaluation. The
@model functions return model outputs for any Python-based model or API, which can then be evaluated with
@metric functions. To test more diverse use cases
@distill functions can be used to create new metadata columns based on raw data instances.
def accuracy(df, ops):
def brightness(df, ops):
Zeno helps you move beyond relying on aggregate metrics and spot-checking model outputs. Instead, it allows you to develop a deep and quantitative understanding of how your model behaves.
Zeno supports your workflow - it is model and data agnostic, and requires just a single Python function to start working. It also lets you test diverse mode use cases, from potential fairness concerns to robustness checks. Lastly, Zeno lets you compare your models and detect potential regressions as you release updates.