Skip to main content

Defining Metrics in Experiments

Confident AI allows anyone, including non-technical uesrs such as domain experts or human reviewers, to easily define, select and configure metrics on the platform without writing a single line of code by creating experiments on the platform. This includes LLM system metrics, conversational metrics, as well as custom metrics.

info

An experiment in Confident AI is a collection of metrics that you can use to benchmark your LLM in a contained way. Running an experiment produces a test run, which contains the evaluation results of your LLM app's test cases.

Setting Up

Log in to Confident AI by heading to the platform or running the following command in your CLI.

deepeval login

1. Creating your Custom Metrics

To create a complete experiment, you'll first need to define your custom metrics, if applicable. In our medical chatbot use-case, we'll be defining 2: Diagnosis Specificity and Overdiagnosis. Start by navigating to the Metrics page, selecting the Custom Metrics tab, and clicking Create Metric.

Langchain

Specify the custom metric's name, criteria, or evaluation steps (we recommend defining evaluation steps for granular control), and the parameters the metric will use to evaluate your test case. Once you've finalized the details, click Create New Metric.

tip

To learn more about test cases and their parameters in DeepEval, visit this section.

Langchain

Once you've finished defining all your custom metrics, they'll appear here like this:

Langchain

2. Creating an Experiment

Next, head to the Evaluation & Testing page and click create new experiment, where you'll be presented with all the available metrics on DeepEval as well as the custom ones you've defined.

Langchain

We'll name our experiment Test Medical Chatbot and select all the relevant metrics: 5 RAG metrics, Hallucination, Tool Correctness, as well as our 2 custom metrics (Diagnosis Specificity and Overdiagnosis). Click create experiment.

Langchain