Getting Started with Tableau Prep
Many data scientists have heard the line – “People spend 80% of their time prepping data and only 20% of their time analyzing it.” And now, as an overload of data becomes available, dealing with untidy data becomes excessively difficult. Cleaning data is known as one of the most tedious jobs in a data scientist’s career.
What does Tableau Prep do?
But what does Tableau have to do with this? Tableau happens to be one of the leading data visualization tools in the industry, and now, with Tableau Prep, it has incorporated a missing feature into its offerings. Tableau Prep makes doing routine tasks, such as joins, unions and pivots a lot easier to do. And, of course, you don’t need any programming skills to do this, which is a major reason for Tableau’s success.
Let’s get started!
Building a Flow
- Connecting to Data
First, we’ll connect to our data set.
Once you’ve opened the file, you will see this screen.
We’ll drag out a sheet and here we have our first step in the file.
- The Input Step
The input step is configurable in the profile page below. We can bring in every sheet in the connected file by using Wildcard union.
On the right, you can see a list of the fields you’ll bring in from the table. You can select which rows you want to keep and which ones you don’t.
- Renaming Steps
You can rename a step by double-clicking and typing a name.
To add another step in the flow, we’ll click the plus button. Since we want just a basic cleaning step to start, we’ll select ‘Add Step’.
- Cleaning Steps
Cleaning steps help us see the state of our data and what we need to do to clean it. Below the flow pane, we have the profile pane which shows a card for each field in the data set.
The cards display the values as well as distribution information about how frequently each value appears. By clicking on a bar, we can highlight related values in other fields.
The Info field has more than one type of data in one column. If we look at the data grid, we can see that first, we have the name of the book, the author name, the dollar price, and the ISBN number, each separated by a pipe.
We can split these values as needed for analysis. Click on the card and open the menu, there are multiple cleaning options here, but we’ll select automatic split as tableau is smart enough to recognize the common delimiters – even when they’re different – and will split them out into unique fields.
To rename the fields, just double click on the name, and type out another name.
As we no longer need the original, un-split field, we can delete it.
- Changing Data Types
Price is currently a string data type, but it should be a decimal. We can click the data type icon and select number (decimal).
We have some more weeks’ worth of data in the data set, so we can add that. Add a new connection, and bring in a table, and select wildcard union.
To combine two steps in the flow, drag one onto the other and select join or union. Since we have the same column structures, we want to union.
To make sure the cleaning we did is applied to the union and not only the first data set, we can click on the line and select remove, and then drag the union step to the cleaning step.
- Output Step
Our data is prepped and ready for use, so we’ll add an output step. We’ll click the plus and select ‘Add Output’. We can choose what file type to save the file as, where to save it, and what to name it. Now when we run the flow, we create a new file. This file contains all our data, cleaned and combined.
Prepping data in Tableau Prep can be simple or complex, and each step has robust options. Thank you for reading this article on Tableau Prep, and I hope you will continue to read the next articles in this series.