In late April of this year (2018) Tableau released a new tool called Tableau Prep.
So what is Tableau Prep?
You know all that very unsexy data prep stuff you have to do before you can create anything interesting?
The stuff you do to get the dataset ready for analysis in Excel, or maybe in SPSS, R, SAS, or <insert one of many other tools here>.
Tableau has now given you another option. Not only will Tableau prep give you a different, more visual, way to prepare data. You’ll also end up with systematic (and easy to alter) change history covering the full preparation process.
But is it actually worth trying out? Let’s give it a test.
We’ll start with some rough data.
Back when I worked with early childhood data in North Carolina, one of the key pieces of data we would examine was child care quality. When I was on the inside I had a data warehouse login, but the data is also available publicly through a search form on the NC DHHS website.
So let’s use the basic search form to pull the data.
Don’t enter anything, and just go down to the bottom of the page and click submit on the blank form. This will give you data for the whole state with the contact information and license ratings.
If you’re a programmer you might see this as an opportunity to scrape some data. But that’s more work than it’s worth right now. Instead I went ahead and just copy/pasted all the data from the page into an empty Excel spreadsheet.
The last time I worked with this data using this technique, I prepped the file within Excel. A series of sorts, a few formulas, and some row deletes brought me to something I could use in Tableau.
It wasn’t exactly systematic and I didn’t write down all the steps, so probably wouldn’t be the best approach if this data was super important to your job.
Bringing the Data into Tableau Prep
I started by just opening up Tableau Prep and dropping in the Excel spreadsheet.
Then I went ahead and clicked on Sheet1 in Tableau Prep to start cleaning the data.
Here are some of basic operations I ended up doing during the data cleaning step. I screen capped them after finishing the cleaning.
So I let Tableau’s Data Interpreter give my data a quick clean. This basically distributed the merged cells and put in 3 lines of data for each facility. Not where we will want to end up but something perfectly helpful at this early stage.
There were a couple of variables I really wanted to recode. Instead of a license type field I wanted the star rating for each facility (1, 2, 3, 4, 5, or other). I also wanted to separate the center and home variables. Both of these tasks were a cinch by just duplicating the license type field and then using group and replace.
Tripling the data because of the address field doesn’t really make sense. Really I only need the second row (which includes city and zip code information). In order to do that I started by excluding all the contact information phone number rows by searching for any cell that starts with a “(“. Then I excluded any address contact information rows that start with a number.
Next I used automatic split to break up city and zip into their own variables.
Outputting the Data
After I was done with all the cleaning steps, I setup the Prep file to export the data. This gave me a Tableau Data Extract to import right into Tableau Desktop.
Playing with the Data
Now let’s give the data a test.
I went ahead and created a nice little map with the average star rating for facilities by zip code. I set the color cutoff at 4 stars (as this is what Smart Start would usually consider the minimum for good quality childcare). This gives a nice map that’s pretty easy to explore.
It also hinted me into an issue with the zip level data (165 rows have the extra 4 digits which were not auto picked up by Tableau Desktop). If this were a real project I would go back to prep and make the change to 5 digits for everyone.
Using a simple dot plot lets me sort the data by average star rating to get a different view of the data.
Then splitting the data by home licenses and center licenses shows pretty quickly one of the biggest differentiators in level of care. Home based child care, no matter the zip code, is by and large going to show as lower quality.
This makes sense but also shows why counties without a lot of centers (and more home care settings) would show indicators of lower quality care.
I have to say, I was really impressed by Tableau Prep.
I spent a good half of my career in data prep and even though I know how to code and write syntax, the workflow in Tableau Prep was so smooth and useful that I could see myself switching to Prep for most of my work.
The natural audit trail offers a clear advantage to cleaning in Excel. All in all, it’s easy to use, visual, and makes for a replicable data preparation process.
Want to see my finished product in action?