One of the key actions for any data analyst is to be able to pivot data tables. Luckily Pandas has an excellent function that will allow you to pivot. To create this spread shit style pivot table, you will need two dependencies with is Numpy and Pandas. However, in newer iterations, you don’t need Numpy. Let’s open up Jupyter Notebooks using the Google analytics as dataset. You can find this dataset on the quickly improve your skills in Python by taking Python for Data Science Bootcamp Course.

The essential parts of a pivot table are the following arguments

- data table
- index
- values
- aggregation

We will explore to access these features using the Pandas library. The first step is to load the dataset. I am using my Google Analytics data for demo purposes.

### Import the Libraries:

To use the Pandas pivot table you will need Pandas and Numpy so let’s import these dependencies. We need Pandas to use the actual pivot table and Numpy will be used to handle the type of aggregation we want for the values in the table.

import pandas as pd

import numpy as np

We are using the standard aliases for both Pandas and Numpy which are pd and np.

### Load your Data

Load the data set. My data is stored on a CSV file named categories which I exported from my Google Analytics. I will load this data and store in a variable called df using the Pandas read_csv function.

df = pd.read_csv('categories.csv')

We can check the columns in the data by using the attribute columns at the end of our new data frame

df.columns

your output will appear below.

Using a single value in the pivot table.

pd.pivot_table(df,index="Gender",values='Sessions", aggfunc = np.sum)

Let’s take a look at the output.

### Multi-Index Pandas Pivot Table

You can make multi-index pivot by just simply passing a list into the index parameter.

pd.pivot_table(df,index=['Default Channel Grouping', 'Gender'])

Multi Aggregation Types in a Pivot Table

You can achieve this by simply adding a dictionary to your aggfunc parameter in the pivot table. Remember that dictionaries a signal by curly brackets and key value pairs.

pd.pivot_table(df, index=['Gender', 'Default Channel Grouping'],values=['Sessions','Value'],aggfunc = {'Sessions':np.sum,'Value':np.median})

Percent of Grand Total

To get a percent of grand total aggregation in your pivot table you will need to use a function that iterates across each row. Luckily you can do this using a default lambda function in the aggfiunc parameter.

pd.pivot_table(df,index='Gender',values='Sessions',aggfunc = lambda x:x.sum()/df['Sessions'].sum())

## Leave a Reply

1 Comment on "Create Pivot Tables with Pandas"

how do I calculate the percentages through subtotals, for example the subtotal Default Chanel