Tableau Prep – The Profile Pane and Pivot Step
Introduction and Significance
Exploring data is probably one of the first things a data analyst would do when dealing with a new data set. It is important to understand the profile or the characteristics of the data before you start analyzing and visualizing it.
For this article, let’s use a hypothetical case study of a large bank in Canada. Consider aggregating IT risk data for this bank with each of the various geographies reporting their respective data on IT risk incidents, loss events and issues. Now each geography, in turn, has different operational processes and systems of how they are managing this risk. So, for e.g. Canadian Business units are using RSA GRC System. US business units use excel file stored on SharePoint. Business Units in India use their home-grown system developed in Visual Studio .Net. Your job is to collect data from all of these different sources and aggregate them for reporting purposes to the board of directors.
As you would have figured out by now, the real challenge here it to make the data consistent for aggregation.
Understanding the relationship within the data set is a key step. You can identify patterns and outliers with the data using this step. Consider the following fields in the source data
- Finding ID
- Residual Risk Rating
- Issue Title
- Issue Summary
- Line of Business
- Issue Owner
- Created Date
- Health Status
Considering the above, it will be important to see the profile for each of the fields and how do they relate to each other.
So, for example, you could have the following situations (only for illustration purposes, not an exhaustive list):
Distribution of Data:
You can review the distribution of data – visually. Refer to the screenshot below and note for e.g.
- Data contains issue for the Month of Jan, Feb, and March of 2019
- There are 4 lines of business in this data set (LOB1, 2, 3 and 4)
- There are only 3 owners for all the 50 issues
Identification of Errors and Relationship:
Null or Missing Values in Issue Owner Field: For legitimate business reasons it might be possible to have issues without an owner defined yet, although this is expected the situation to have as an exception, it is not expected to have a large portion of issues without owners. You can use Tableau Prep to identify patterns for such values. For e.g. Refer to the screenshot below. You can click on Null Values in the Owner Field to quickly reveal that there is one line of business (i.e. LOB 3) which is causing the problem.
To identify additional relationships
- Click on any data value, Tableau Prep will highlight the related data values in other field cards.
Tidy Data and Columns to Rows
It is important to put your data in the correct format. The tidy data philosophy will help you to understand when to pivot and unpivot your data. The above image highlights how tidy data should be.
- Variables should be a single column
- Observations should be a single row
- Each value should be in a cell.
Short and Wide Data Bad , Long Skinny Data Good
Pivot is a frequently used method in data analytics. In Tableau Prep, the key use of Pivot Step is to convert data sets that have many columns into a smaller number of columns which are easier to then analyze in Tableau. Consider the following data format.
Notice that it has 1 finding in each column (versus a row). This format is hard for Tableau to analyze and must be pivoted (or unpivoted) to make it analyzable.
Pivot to Unpivot
Using the Pivot Step in Tableau we can achieve this. See the screenshot below. Notice how we have created a column for Finding ID and another column for risk rating for those finding ID.
Data Profiling is a fundamental activity for a data analyst before they use the consume the data in Tableau for visualizations. Typically pivoting step is used to unpivot Excel data with a large number of columns which are easier to read by humans but difficult to be used by Tableau for visualization purposes, and hence, it needs to be transformed as part of preparation.