Using Generative AI to Clean a CSV File
Ever cleaned a csv? It’s a headache, which is exactly why AiSDR turns it over to AI. Find out how we did it from our CTO
With generative AI, you can quickly clean and process massive lead lists in minutes.
But before you can do this, you’ll need to do a bit of prep work.
Don’t worry though. The time you invest now will pay huge dividends in the long run.
After all, manually cleaning a lead list can take you hours, and you could still have mistakes and typos at the end of the day.
At AiSDR, we’ve delegated CSV cleaning completely to generative AI. In addition to saving us considerable time that we repurpose for product development, we also cut down on the number of innocent mistakes and typos.
TLDR:
- The goal: Clean a messy CSV file
- The tactic: Set up generative AI to clean the CSV file
- The result: Get a CSV that contains your data in a clean and readable format
Step 1: Design a Prompt for Generative AI
This is the hardest and most time intensive step as it will require a bit of testing, as well as trial and error.
Essentially, your prompt will instruct generative AI about what to do and what not to do.
For example:
You are a csv parser and cleaner that follows these rules:
- Never change the values of fields
- Output columns must be in the following order: First name, Last name, Email
- No other columns must be present
Here we’ve told generative AI:
- What it’s doing (i.e. “You’re a csv parser and cleaner”)
- Rules (i.e. “Never change values”)
- Expected output (i.e. “Output columns must be in the following order”)
This is just the start of the prompt though. You’ll need to add additional instructions based on your testing results.
Step 2: Feed Common Mistakes and Typos into Generative AI
People make mistakes, which means data entry can get a bit messy.
If you’ve collected enough lead data and consistently worked with CSVs, chances are you’ve noticed certain errors occur with some degree of regularity.
Consider these examples:
- John Smith, Ph.D.
- Nike, Inc.
- 11/12/2024
Each example is written correctly. However, each example can cause an error during CSV import and cleaning.
CSV files separate values using commas, so generative AI will misread the info and cause “Ph.D.” and “Inc.” to get entered into the wrong column. You can tell generative AI that these cases can happen and what to do if it comes across them.
As for the date, it will produce an error depending on your location. If you’re in the United States, then “11/12” means “November 12”, but if you’re in Europe, “11/12” is “December 11”. Accordingly, you’ll want to specify to generative AI your expected date format.
Step 3: Provide an Example of a Good CSV
Generative AI models function best when they’re able to work off of an example.
If you already have a CSV lead list, then you’re set and you can use it as the example. Alternatively, you can create a small CSV with 5-10 rows of imaginary lead data. This should be enough to get the point across to generative AI.
However, keep in mind that the example should be “perfect”. Otherwise, generative AI could build errors into future files.
The Result
If everything works as expected, you should be able to:
- Upload any CSV file containing lead data to generative AI
- Have generative AI clean the file for you based on your criteria
- Download the resulting file