Skip to main content

Demos

Try out some of our hosted demos to get a feel for Zeno.

Image Classification with Imagenette​

Open With Zeno


For this classic image classification task on a subset of the whole Imagenet dataset, we are using Zeno to compare two CNN models. It includes multiple distill functions for image features such as brightness and certain colors. This model includes model projections, which can be used to find potential model errors.

Explore the code

Audio Transcription​

Open With Zeno


We wanted to compare Open AI's Whisper model with existing off-the-shelf audio transcription models, in this case the Silero model. For our evaluation dataset we used the Speech Accent Archive, a collection of audio clips of people from around the world saying the same phrase.

Can you find differences in model performance across geographic regions and other speaker features?

Explore the code

Auditing Image Generation Models​

Open With Zeno


Zeno can also be used for analyses of generative models. In this example we are exploring the DiffusionDB dataset. Instead of a typical aggregate metric, we measure the average NSFW level of the prompt and images.

Can you find potential biases in diffusion models, e.g. different levels of NSFW for different types of prompt keywords?

Explore the code

Q&A Chatbots​

Open With Zeno


Chatbots like ChatGPT are an increasingly popular application of language models, and libraries like LangChain are making it much easier to implement LLM-based applications. In this demo we use Zeno to explore how well a LangChain model for answering questions over a Notion database performs.

Explore the code

Sensor Data Exploration​

Open With Zeno


Zeno can also be used for unstructured data exploration. In this demo, we explore the MotionSense dataset of IPhone sensor data. This demo could be extended to include activity classification models.

Can you find interesting sensor patterns between the different activities?

Explore the code

Translation​

Open With Zeno


Explore the performance of a dozen different translation models on a subset of the WMT 2014 English-German dataset. Compare them across different metrics, including BLEU and BERTScore.

Explore the code