Sales Prediction Model in Power PI
Leveraging the Python Scripting option in Power BI is a powerful way to build complex machine learning models with the interactive of a dashboard.
For the Python model, the SciKit Learn library to create a Linear Regression model that will have a training and testing set for the model to learn on. Then we will run the model on the total dataset.
We can derive the coefficients and rebuild the linear regression equation using What-If parameters in Power BI.
TV | radio | newspaper | sales |
23.01 | 37.8 | 69.2 | 27183 |
4.45 | 39.3 | 45.1 | 12792 |
This is a sample of the data set that is going to be used.
In the data above we will be using Sales as our predictor and the 3 channels will make up of our coefficients.
When building your code, its best to use an IDE which will give you the ability to decode the Python script. Spyder is a good lightweight IDE that come with the Anaconda
Get the dataset: Advertisment Dataset
This is the code:
#Load in the dependencies
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder, StandardScaler
dataset=pd.read_csv('HR_comma_sep.csv')
#lets change categories to numbers
le = LabelEncoder()
dataset['Departments'] = le.fit_transform(dataset['Departments'])
dataset['salary'] = le.fit_transform(dataset['salary'])
#preprocess your data
y=dataset['left']
features = ['satisfaction_level', 'last_evaluation', 'number_project',
'average_montly_hours', 'time_spend_company', 'Work_accident',
'promotion_last_5years', 'Departments', 'salary']
X=dataset[features]
#lets scale the data
s = StandardScaler()
X = s.fit_transform(X)
#split and train the dataset
X_train,X_test,y_train,y_test = train_test_split(X,y)
#Let the model predict results
log = LogisticRegression()
log.fit(X_train,y_train)
y_pred = log.predict(X)
y_prob = log.predict_proba(X)[:,1]
# Lets add the columns back to the dataframe
dataset['predictions'] = y_pred
dataset['probabilities'] = y_prob
Please review the video and the code below. Feel free to ask questions in the comment section below.
Can you please attach the sample dataset in CSV format.
Thanks Gaelim Holland. Just small findings, The code you mentioned in the blog is for HR Data set. Where Can I find/get that data set? Thanks.
sorry you should find the data set on this page got churn model :https://www.absentdata.com/power-bi/python-machine-learning-in-power-bi/
Great work! Can you share the pbix file