Use ChatGPT to Clean Up Messy Data

Updated July 2nd, 2024.

Sometimes you need data arranged in a particular way, but the way the data comes to you isn’t that way at all. You can spend hours rearranging things yourself, or… you can use ChatGPT to do it. Here’s how.

Example 1: Creating a table of data from a PDF

Every year, I make a calendar with all of the NFL prime-time games on it. This calendar can be viewed along with my regular iPhone and Mac calendar, and I share the calendar so others can use it too. Making the calendar is easy once I have the data in rows and columns, like this:

Numbers spreadsheet with the NFL prime-time games listed in rows and columns
This is how I want the data

The problem is, I can’t get the data like that from anywhere. I always have to take what I can get and then rearrange it myself. Typically I get the data from the NFL website, which looks like this:

The NFL.com webpage showing the Monday Night Football games.
The NFL.com webpage showing the first five Monday Night Football games.

You might think a quick copy and paste job would be the way to start. But it isn’t, because the NFL doesn’t include the words for the teams. They only include the logos! Also, notice that the dates aren’t written like dates (they don’t include the year). You can see that better in the copied-and-pasted data below,

Pasted text from NFL.com-- no team names!
Copied and pasted– no team names!

Had the team names been in the data I could have pasted the data into ChatGPT, asked it to add the year to the dates, and then asked it to arrange it all in rows and columns. But as you see, the pasted data didn’t include the team names.

One way around this is to export the web page as a PDF, and then ask ChatGPT to analyze it. That’s what I did. The PDF looked just like the web page, complete with the team logos (and with dates that didn’t include the year). I was hoping that ChatGPT would recognize the logos and substitute the names.

Giving ChatGPT a PDF to analyze
Giving ChatGPT a PDF to analyze

Note: analyzing a PDF, or an image, requires at least ChatGPT 4 (ChatGPT 3.5 can’t do it). Be sure you’re using version 4 or higher when you ask ChatGPT to analyze a PDF or image.

ChatGPT first reported what it thought the PDF contained:

ChatGPT reports what it thinks is in the PDF
After analyzing the PDF’s content

ChatGPT refers here to my “preferred format” because I’ve asked for this kind of thing before, and I told ChatGPT to remember what I like. Literally I typed “Remember what I like.” FYI.

Scrolling down we see ChatGPT describes the first bit of information found, and then formats it as an example.

ChatGPT showing how it intends to format things
ChatGPT showing how it intends to format things

Scrolling down some more we find ChatGPT is ready to tackle the entire PDF, and does.

ChatGPT preparing to analyze and format the entire PDF
ChatGPT preparing to analyze and format the entire PDF

In moments the data is presented in a table, and there’s a down arrow in the upper right corner which I could use to download it.

ChatGPT produced a table with columns the way I wanted.
The finished table, and a download arrow

The downloaded file was a CSV, which isn’t exactly what I wanted, so I asked for an Excel sheet. As usual, I didn’t restate the entire question. I just asked a follow-up question.

Asking ChatGPT for an Excel spreadsheet, and getting it

The downloaded spreadsheet looked like this:

ChatGPT's finished spreadsheet
ChatGPT’s finished spreadsheet

That spreadsheet doesn’t look like a calendar, and a calendar is the end goal, but I wrote an AppleScript that turns columnar data into a calendar (as long as the columns are in that order, and formatted as shown). So I’ve taken care of that part myself. ChatGPT has done its job (and it’s a job that used to take a lot of work by me). Thanks, ChatGPT!

Example 2: Simplifying data found on a webpage

The MLB.com site has a ton of data. Too much, actually. See below.

MLB.com statistics page
MLB.com statistics page

So, I copied a big bunch of it and asked ChatGPT to make it prettier– and to only pay attention to a few of the many columns.

Asking ChatGPT to make the data prettier and simpler
Asking to make it prettier, and simpler

This time, a drag-through-the-data-to-select-it, then copy, then paste, was all that was needed. Paste right into the box where you’ve been typing, down at the bottom of the ChatGPT window.

Afterwards, I asked ChatGPT to simplify things, and to make an Excel spreadsheet, and a graph.

Asking for, and getting, an Excel spreadsheet and a column graph.
Asking for, and getting, an Excel spreadsheet and a column graph

Scrolling down we see ChatGPT has made the spreadsheet for me, and the graph, and has a link to each.

Excel sheet and chart, courtesy of ChatGPT.
ChatGPT coming through with the Excel sheet and the chart.

Here’s the spreadsheet:

Simplified, sorted spreadsheet, courtesy of ChatGPT
Simplified, sorted spreadsheet

Just what I wanted. Now I can do something with this data (get averages, make charts, etc.)

Example 3: Converting messy data into rows and columns

Here’s another situation where the data we want exists, but not in the form we want. It’s a calendar with the high and low temperatures for Santa Monica, California for June 2024. But I don’t want a calendar, I want rows and columns.

Web page showing a calendar with high and low temperatures.
High and Low temperatures, in a calendar format

I highlighted the data by dragging through it, then copied it, then told ChatGPT I wanted to have it in rows and columns. Pasted in, you can see how ChatGPT has started to clean up the data already.

ChatGPT converting the calendar data to rows and columns
Asking ChatGPT to convert the data to rows and columns

Right away ChatGPT gives me the table I’m looking for.

The table of data, thanks to ChatGPT doing the work.
The converted data

The smart extra step was to ask for an Excel spreadsheet, which ChatGPT is very good at making. Here it is:

Final spreadsheet, courtesy of ChatGPT
Excel spreadsheet with the data from the calendar layout

So there you are– three examples of data in the wrong format easily converted to something useful in a matter of seconds. You should try it!


More ChatGPT Tutorials by Christian Boyce

Copyright 2008-2024 Christian Boyce. All rights reserved.

Did this article help you?

Maybe you'd like to contribute to the
Christian Boyce coffee fund.




Want some some quick iPhone how-tos?
Visit me at iPhoneinaminute.com.

Looking for quick tips about Macs?
See my One-Minute Macman website!



Please use the “Sharing” buttons to share what’s interesting with your friends. It helps your friends and it helps me too. Thank you very much!

Sincerely,
Christian signature
Christian Boyce

Christian Boyce is a Mac and iPhone expert with over 30 years' experience in the field. His specialty is teaching people how to get more out of their Macs and iPhones using the software and apps already installed. He is the author of several books, a guest speaker for Mac and iPhone user groups worldwide, and a former rocket scientist. He splits time between homes in Santa Monica, California and Round Rock, Texas.

Please Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

Read our Privacy Policy