Sales Prediction Model in Power PI

Leveraging the Python Scripting option in Power BI is a powerful way to build complex machine learning models with the interactive of a dashboard.

For the Python model, the SciKit Learn library to create a Linear Regression model that will have a training and testing set for the model to learn on. Then we will run the model on the total dataset.

We can derive the coefficients and rebuild the linear regression equation using What-If parameters in Power BI.

TV

radio

newspaper

sales

23.01

37.8

69.2

27183

4.45

39.3

45.1

12792

This is a sample of the data set that is going to be used.

In the data above we will be using Sales as our predictor and the 3 channels will make up of our coefficients.

When building your code, its best to use an IDE which will give you the ability to decode the Python script. Spyder is a good lightweight IDE that come with the Anaconda

Get the dataset: Advertisment Dataset

This is the code:

#Load in the dependencies
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder, StandardScaler


dataset=pd.read_csv('HR_comma_sep.csv')

#lets change categories to numbers
le = LabelEncoder()
dataset['Departments'] = le.fit_transform(dataset['Departments'])
dataset['salary'] = le.fit_transform(dataset['salary'])
#preprocess your data
y=dataset['left']
features = ['satisfaction_level', 'last_evaluation', 'number_project',
'average_montly_hours', 'time_spend_company', 'Work_accident',
'promotion_last_5years', 'Departments', 'salary']
X=dataset[features]
#lets scale the data
s = StandardScaler()
X = s.fit_transform(X)

#split and train the dataset
X_train,X_test,y_train,y_test = train_test_split(X,y)

#Let the model predict results
log = LogisticRegression()
log.fit(X_train,y_train)
y_pred = log.predict(X)
y_prob = log.predict_proba(X)[:,1]

# Lets add the columns back to the dataframe
dataset['predictions'] = y_pred
dataset['probabilities'] = y_prob

Please review the video and the code below. Feel free to ask questions in the comment section below.

Gaelim Holland

Subscribe
Notify of
guest
4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Saurabh Mishra
Saurabh Mishra
7 months ago

Can you please attach the sample dataset in CSV format.

Sandeep M
Sandeep M
1 month ago

Thanks Gaelim Holland. Just small findings, The code you mentioned in the blog is for HR Data set. Where Can I find/get that data set? Thanks.

Wally
Wally
18 days ago

Great work! Can you share the pbix file