Reference no: EM133688972 , Length: word count:1000
Predictive Analytics
Assignment: Building and Evaluating Predictive Models using SAS Enterprise Miner
Objective:
- Demonstrate knowledge of building different types of predictive models using SAS Enterprise Miner
- Demonstrate skill and knowledge in applying predictive models in a real-life predictive analytics task
- Relate theoretical knowledge of predictive models and best practices to application scenarios
Business Case - Predictive Model for Vehicle Price Prediction
Alpha (Pvt) Ltd is an Australian online car sales platform for providing an effective car buying and selling service. In order to help boost sales transactions, the management of Alpha is in the process of building a car price estimation system to help second-hand car sellers to sell their cars at the best price.
Alpha management is very keen to trial predictive modeling for this task and has gathered a historic car sales dataset from a publicly available data repository. The dataset contains 21 variables describing previously sold cars. The attributes include the selling price of cars, year, odometer reading, fuel type, condition, location, etc. The list of attributes and their descriptions are given below.
Variable Description
id Unique Id of the record
region Region
price Price
year Launch year
manufacturer Manufacturer
model Model
condition Overall condition of the vehicle
cylinders Number of cylinders
fuel Fuel type
odometer Odometer reading
title_status Condition - whether the vehicle is free from accidents/ repaired/ rebuilt etc.
transmission Transmission type
vin Vehicle identification number
drive Drive type
size Size of the vehicle
type Vehicle type
paint_color Paint color
county County
state State
lat Latitude
long Longitude
The management of Alpha (Pvt) Ltd. is considering you as an external consulting group to outsource the task to develop a reliable predictive model to predict the selling price of the cars, using the aforementioned historical dataset. Alpha has provided you with sample data sets of BMW, Mercedes, Toyota and Honda cars to build separate price-prediction models. They also wish to compare and contrast the pricing models between two car brands.
You have to only select one dataset from the four datasets provided. Based on the selected vehicle dataset, you are required to build different predictive models, compare and contrast which is the best model for the selected dataset. You are also provided with a scoring dataset which you need to use for price prediction.
Setting up the project and exploratory analysis
Create a new project and create a data source based on the selected dataset.
Carry out data exploration by using a StatExplore Node. Explain your findings with regard to your vehicle dataset.
Create a Data Partition with 70% of the data for training and 30% for validation.
Decision tree-based modeling and analysis
Carry out the following modeling tasks for the selected vehicle dataset.
Create two Decision Tree models. Use two-way and three-way splits to create the two separate decision tree models.
For each decision tree,
How many leaves are in the optimal tree?
Which variable is used for the first split?
What are the competing splits for this first split?
Which of the decision tree models appears to be better? Justify your answer.
Refer to the selected decision tree model in part (b) and
Identify leaf nodes which have good predictive performance (two leaf nodes) and poor predictive performance (two leaf nodes).
Provide justifications for your selections.
Write down the rules for the pathways leading up to each selected leaf node.
Regression-based modeling and analysis
In preparation for regression, is any imputation of missing values needed? If yes, should you do this imputation before generating the decision tree models? Why or why not?
Use an Impute node connected to Data Partition node to handle missing values. Which variables have been imputed?
Are there any ordinal variables? Use the replacement node to assign relevant values.
Conduct data exploration to select the best variables for the model. Explain your findings.
Hint: You can connect the ‘Variable Selection' node in the ‘Explore' tab to the datasource and observe which variables have been picked as selected variables. To manually change the variables, go to ‘Manual Selection' in the properties panel and adjust role of the variables.
Create a Regression model using the set of variables you identified as suitable in part c. You can choose the stepwise selection and use validation error as the selection criterion.
Run the Regression node and view the results.
Which variables are included in the final model? Explain what this means to the vehicle sales organization (very briefly).
What is the validation ASE? What does this mean in a predictive model?
Model Comparison and Scoring
Use the model comparison to compare and contrast the results from the decision trees and regression-based analysis. Describe and justify how you ascertained the better model.
Compare and contrast the best model selection for the car brand you selected. Would it have been sufficient to use only one modeling technique (decision tree or regression)? Provide justifications for your answer.
Use the scoring data sets to score using the best predictive model for your vehicle brand. Explain the output using plots.
Comparison between Car Brands
Choose another car brand and apply the SAS model flow to carry out an analysis between two car brands.
What are the factors that are significant in determining the price of the two car brands? Are those factors the same or different across the two brands? Justify with external knowledge. (e.g., What do the differences in features/variables say about the buyer's interest in different models?)
Compare the best selected predictive models for the two car brands. Do you recommend decision trees or regression? Outline reasons.
What are your suggestions to improve the car price prediction models of Alpha Management?
Assignment: Building and Evaluating Predictive Models using SAS
Enterprise Miner Assignment Submission Instructions
You are required to submit TWO (02) files in assignment submission site in LMS
a) A word or pdf document with responses to questions. Name this document as
<student_id>Assignment2_report (.doc or .pdf)
b) A SAS package file (.spk extension) which is a ‘model package' of your project (please see the instructions below to create a model package of your SAS project)
How to Create a SAS Model Package
Open your Assignment 2 project and diagram
Highlight the last node in the diagram. (Although it has not been specifically mentioned, you should have created a model comparison node and linked the different models to this node. Hence, the model comparison node is the final node).
Select Actions => Create Model Package from the menu on the top. In the input dialog box provide a name for your model package. To ensure that you have a unique filename, include your student id as a prefix to Assignment 2 model package (e.g., <student_id>_Assignment2).
When the model package generation is completed, select OK.
The model package now appears under the project in the top left hand corner of the screen.
Right click on the model package you have created and then click ‘Save As'. You will be prompted to save the model as a SAS Package File (.spk) - this can be saved in your hard drive.