Reference no: EM133768418
For this lab test, you are required to use some training data in the scenario below to train a classification model.
By the end of this lab test, you must submit:
A Word or PDF report document that answers the questions outlined further below
An Orange project (.ows) that includes the requirements outlined further below
You must submit these as two separate files further below.
GetFlix is a video-on-demand streaming service.
Customers can join GetFlix on a trial basis, where the subscription cost during the trial period is slightly cheaper than the usual subscription cost. GetFlix has collected some data on its trial customers at the end of their trial period:
Trial Type: New customers can choose to trial the service for a month or for a year
Subscription Level: GetFlix has two subscription levels: Basic and Premium
Platform Usage Score: This is a score from 0 to 100 that indicates how much the customer interacted with the GetFlix service during their trial
Customer Satisfaction Score: This is a score from 0 to 100 that indicates how satisfied the customer seemed to be with the service during their trial
Support Requests: The number of times the customer made a support request during the trial
Customer Renewed: Yes or No, to indicate whether or not the customer renewed to a full subscription following the trial period
Here is the dataset of historical data that GetFlix has collected:
GetFlix_historical_customer_data.csvDownload GetFlix_historical_customer_data.csv
And here is a small sample of data for several customers that are about to reach the end of their trial period:
GetFlix_new_customer_sample_data.csvDownload GetFlix_new_customer_sample_data.csv
In this lab you will use the historical data as training data for a classification model, and use any models developed to make predictions on the sample of new customers.
Step 1 - Use the Training Data
Take a look at the historical data that GetFlix has collected, and the small sample of data for customers that are about to complete their trial period.
Questions:
What specific classification task could a classification model perform, if trained on the historical data?
What is GetFlix trying to predict for the new customers?
Use the File widget in Orange to load the historical data file into a project. This is your training data.
Step 2 - Preprocessing
Question: What preprocessing do you need to conduct on this training data to help prepare it for machine learning? Be specific, indicating any specific changes or amendments that you have made to the data and why.
Use any relevant widgets from the Transform section of Orange to perform this preprocessing on your data.
Step 3 - Select and Run the Learning Algorithms
Use the kNN and classification tree algorithms from the Model section of Orange to train some classification models with the training data.
Train more than one model with each algorithm, to try out different hyperparameters.
Step 4 - Model Evaluation
Use the Test and Score widget to perform an evaluation of each model that you've trained.
In the Test and Score widget, use Random Sampling with the following settings:
Repeat train/test: 10
Training set size: 70%
Get a screenshot showing the Precision and Recall of each model that you trained and paste it into your report.
Questions:
Which algorithm/hyperparameters provide models that perform well?
Which algorithm produces the best performing model? What was the best precision and recall that you were able to achieve?
Use the Confusion Matrix widget to show the actual vs predicted labels for each model. Get a screenshot of the confusion matrix for the model that performs the best and paste it into your report.
Step 5 - Make Predictions
Use another File widget to load the sample of six new GetFlix customers into your project.
Use the Predictions widget and your trained models to make predictions on the new customers.
Get a screenshot of the predictions that your models make on the six new customers and paste it into your report