<Back to blog

Using an AI to Validate Another AI’s Output

May 16, 2024

By:

Oleg Zaremba

Generative AI can be unreliable at times. Use this shortcut to quickly validate the quality of AI outputs

3m 47s reading time

Have you ever asked ChatGPT the same question twice?

Did you get the same answer?

The answer’s probably no, assuming the question wasn’t too basic.

This is because outputs from generative AI are “non-deterministic”. In other words, answers will vary, which means we need some way to assess the quality of any given output.

There are frameworks available for evaluating AI outputs, but they frequently require a great deal of industry-specific knowledge.

Or you can do what we did at AiSDR, which is take a shortcut and use one generative AI model to check the output from a different model.

TLDR:

The goal: Confirm whether an AI’s output is sufficiently good
The tactic: Use a different AI to check the output of the original AI (i.e. use Gemini to check ChatGPT’s output)
The result: Speed up the process of checking AI outputs

Step 1: Generate an Output

Let’s say you want to take a closer look at the outputs of GPT-4o.

Before you can test an output’s quality, you’ll need to create an output, which can be text, image, audio, or any other type of content.

Step 2: Reformat the Output

Depending on the AI models you use, you might have to “reformat” the output so that the second AI can work with it better. This is especially true if you’re using LLMs from different providers, like Anthropic and OpenAI.

This step is more commonly known as preprocessing.

Common examples of preprocessing include:

Adding or removing text elements like HTML tags
Breaking larger texts into smaller parts (i.e. essays into individual paragraphs or paragraphs into individual sentences)
Identifying people, organizations, locations, or dates
Converting data into a format that’s compatible with the second model

Step 3: Evaluate the Output

Once the output’s been preprocessed, you can enter the output into the second AI (e.g. Claude 3.5 Sonnet). To systematize your evaluation, you should create a set of criteria by which you can compare AI models.

The set of criteria you create will vary by situation, but common criteria include:

Computational efficiency – Is the model efficient in terms of resource usage, speed, scalability, and performance?
Currency – Is the content reflective of new information?
Relevancy – Is the content on topic?
Authority – Is the content backed by reputable sources or evidence?
Accuracy – Is the content accurate and free of errors or misleading information?
Purpose – Is the content in line with the intended purpose?

You can then use the results to determine whether or not the original AI is effective and suitable for your requirements, which could be anything from model distillation to carrying out simple day-to-day work tasks.

The Result

Using a second AI to validate an AI has helped us achieve several good results:

Resource efficiency – AI’s capacity to process large amounts of data faster and more thoroughly than humans has helped us allocate people and resources to strategic-level tasks. It’s also allowed us to quickly fine-tune our AI’s ability to flag potential issues and non-standard sales emails, as well as recognize common types of messages like auto-replies.

How you can use it – Delegate high-volume, low-level tasks that can be completed with minimal training to AI. Tell the AI when it makes a mistake, and provide examples of what you expect. The more AI works with data, the better its performance. It’s not much different than onboarding a new teammate.

Scalability – Our AI works with thousands of emails each day. If our team was stuck manually checking and classifying every email, we wouldn’t have much time to focus on product development.

How you can use this – If you’re uncertain about how effective AI is at classification or other high-volume, low-level tasks, you can use a different AI to test its ability. This will help you build confidence in your AI or provide insight into where improvements are needed.

Redundancy – Learning how to interact with different AI models has helped us prepare for any situation when a major AI model like ChatGPT goes down. If this happens, we can simply switch the AI engine as we developed AiSDR to be language agnostic.

How you can use this – Design your processes and product development to also be language agnostic. This limits your exposure to an AI going down as your processes will continue to operate as expected.

However, there is one caveat to using generative AI.

Specifically, you’ll have to get used to and work around AI behaving unpredictably.

Book more, stress less with AiSDR

See how AiSDR will run your sales

GET MY DEMO

#AI #AiSDR #Leadership Nuggets

Written by:

Oleg Zaremba