Which tool to choose

It depends on how much data you have.

  • A little data, to fix once (a contact list, a poorly copied list): paste it directly into the conversational assistant and have it reordered. Fast, no tool to learn.
  • A lot of data in a sheet, to clean repeatedly: ask the AI for a formula (for Excel or Google Sheets) or a macro that performs the cleanup, so you relaunch it every time new data arrives.
  • Very dirty or very large amounts of data (thousands of rows from multiple sources): ask for a script in Python, which handles data files without the limits of a sheet.

Data cleaning means making uniform what was written in different ways: "Mario Rossi," "rossi mario," "ROSSI M." are the same person to you, not to the computer.

How to do it

From a browser, a sheet, or a terminal, the principle is the same.

  1. Define what the clean data should look like. Decide the target format: names with an initial capital, dates as 2026-06-15, numbers without dots, one column per piece of information. Without a clear target, the cleanup runs in circles.

  2. Show the AI a sample. Paste a few dirty rows and one row the way you want it cleaned: the example is worth more than a thousand explanations.

    The operational syntax:

    These are names and cities written in a messy way:
    mario rossi - MILAN
    Rossi, Mario (milan)
    LUCIA BIANCHI – Rome
    
    Reorder them into a table with three columns: First name, Last name, City. First and last name with an initial capital, City with an initial capital. Return a clean table and flag the rows you can't interpret with confidence.
    
  3. Check the result on the sample. Verify that the AI interpreted it well before handing it everything. If it gets it wrong, adjust the example and repeat.

  4. Extend to the rest. For a little data, paste it all. For a large sheet, ask for the formula or script that applies the same cleanup to all the rows.

  5. Keep the doubtful rows in sight. The data the AI can't interpret should be checked by hand, not guessed at. Always ask it to flag them instead of taking a guess.

A concrete example

Elena has collected event sign-ups from three different channels: the names are written every which way, the cities sometimes capitalized and sometimes not, some rows have first and last name swapped. She pastes a sample into the AI with the instruction from the example. The assistant returns a tidy table in three columns and flags two ambiguous rows. Elena checks those two by hand, then asks the AI to clean the entire list the same way. From three hundred chaotic rows to a table ready for labels, in half an hour instead of an afternoon.

When it does NOT work (and how to fix it)

If the AI guesses the ambiguous rows wrong

Faced with confusing data, the AI tends to fill the gap with an assumption, which can be wrong. Fix: in the instruction explicitly ask it to "flag the uncertain rows instead of guessing." You sort out the flagged rows yourself: better ten honest doubts than ten hidden errors.

If there's too much data to paste into the chat

The assistant has a limit on how much text it accepts at once. Fix: for large amounts, don't paste the data, ask for the tool. Have it give you a sheet formula or a Python script that applies the cleanup to all the rows while staying on your computer.

If the cleanup loses or alters good data

A rule that's too aggressive (removing all duplicates) can delete legitimate rows that resemble each other. Fix: work on a copy, and ask the AI to show you what it would remove before removing it. For duplicates, have it give you the list of suspects to check, not automatic deletion.

If every time new data arrives you have to redo everything

Cleaning by hand at every update is unsustainable. Fix: turn the cleanup into a reusable tool. Ask the AI for a formula or a macro that cleans the column automatically, so when you paste new data the cleanup starts on its own.

A tip from someone who actually uses it

The example beats the explanation, always. When you ask the AI to clean data, don't describe in words how you want it: show it a dirty row and next to it the same row as it should become. The AI learns the pattern from that example better than from any list of rules. Two or three "before and after" rows and it understands exactly what you mean. It's the fastest way to get the right result on the first try, on any kind of messy data.

Frequently asked questions

How reliable is data cleaned by the AI?

Reliable on the clear cases, to be verified on the ambiguous ones. The AI uniforms perfectly what follows a pattern; on confusing data it can misinterpret. That's why it's worth having it flag the doubts and checking them, instead of blindly trusting the whole table.

Can I clean data that contains personal information?

With caution. Pasting names, emails, or sensitive data into an online assistant means sending them to an external server. For personal data, consider a tool that runs on your computer (a Python script) so the data doesn't leave your machine, and always respect the privacy rules.

Does it also work with data in different languages or strange characters?

Yes, the AI handles multiple languages and accented characters. The difficulties come with corrupted encodings (characters that become unreadable symbols): in that case flag the problem to the AI and ask it to handle the encoding, or fix the source.

Is it better to clean the data with AI or fix the source that generates it dirty?

If the data always arrives dirty in the same way, the real solution is upstream: fixing the form, the sheet, or the process that produces it messy. Cleaning every time treats the symptom; a well-made input field (dates with a calendar, cities from a list) prevents the mess from being created. The AI is excellent for recovering the existing mess, but the mess that doesn't get created is the one that saves you the most.