BUS5PA Predictive Analytics Assignment

Assignment Help Other Subject

Reference no: EM132643438

BUS5PA Predictive Analytics - La Trobe University

Assignment - Building and Evaluating Predictive Models using SAS Enterprise Miner

Objective:

a) Demonstrate knowledge of building different types of predictive models using SAS Enterprise Miner
b) Demonstrate skill and knowledge in applying predictive models in a real-life predictive analytics task
c) Relate theoretical knowledge of predictive models and best practices to application scenarios

Business Case - Predictive Model for Vehicle Price Prediction

Beta (Pvt) Ltd is an Australian online car sales platform for providing an effective car buying and selling service. In order to help boost sales transactions, the management of beta is in the process of building a car price estimation system to help second-hand car sellers to sell their cars at the best price.

Beta management is very keen to trial predictive modeling for this task and has gathered a historic car sales dataset from a publicly available data repository. The dataset contains 21 variables describing previously sold cars. The attributes include the selling price of cars, year, odometer reading, fuel type, condition, location, etc. The list of attributes and their descriptions are given below.

Variable	Description
id	Unique Id of the record
region	Region
price	Price
year	Launch year
manufacturer	Manufacturer
model	Model
condition	Overall condition of the vehicle
cylinders	Number of cylinders
fuel	Fuel type
odometer	Odometer reading
title_status	Condition - whether the vehicle is free from accidents/ repaired/ rebuilt etc.
transmission	Transmission type

vin	Vehicle identification number
drive	Drive type
size	Size of the vehicle
type	Vehicle type
paint_color	Paint color
county	County
state	State
lat	Lat
long	Long

The management of Beta.com Ltd. is considering you as an external consulting group to outsource the task to develop a reliable predictive model to predict the selling price of the cars, using the aforementioned historic dataset. Beta has provided you with a sample data sets of BMW, Mercedes, Toyota and Honda cars to build separate price-prediction models. They also wish to compare and contrast the pricing models of these four car brands.

PART A

You have to select one dataset from the four datasets provided. (Each group member should have a different dataset). Based on the selected vehicle dataset, you are required to build different predictive models, compare and contrast which is the best model for the selected dataset. You are also provided with a scoring dataset which you need to use for price prediction.

1. Setting up the project and exploratory analysis
a. Create a new project and create a data source based on the selected dataset.
b. Carry out a data exploration by using a StatExplore Node. Explain your findings with regard to your vehicle dataset.
c. Create a Data Partition with 70% of the data for training and 30% for validation.

2. Decision tree-based modeling and analysis
Carry out the following modeling tasks for the selected vehicle dataset.
a. Create two Decision Tree models. Use two-way and three-way splits to create the two separate decision tree models.
For each decision tree,
I. How many leaves are in the optimal tree?
II. Which variable was used for the first split?
III. What were the competing splits for this first split?
b. Which of the decision tree models appears to be better? Justify your answer.
c. Refer to the selected decision tree model in part (b) and
I. Identify leaf nodes which have good predictive performance (two leaf nodes) and poor predictive performance (two leaf nodes).

II. Provide justifications for your selections
III. Write down the rules for the pathways leading up to each selected leaf node.

Regression-based modeling and analysis

a. In preparation for regression, is any missing values imputation needed? If yes, should you do this imputation before generating the decision tree models? Why or why not?
b. Use an Impute node connected to Data Partition node to handle missing values. Which variables have been imputed?
c. Are there any ordinal variables? Use the replacement node to assign relevant values.
d. Conduct data exploration to select the best variables for the model. Explain your findings.
Hint: You can connect the ‘Variable Selection' node in the ‘Explore' tab to the datasource and observe which variables have been picked as the selected variables. To manually change the variables, go to ‘Manual Selection' in the properties panel and adjust role of the variables.

e. Create a Regression model using the set of variables you identified as suitable in part c. You can choose the stepwise selection and use validation error as the selection criterion.
f. Run the Regression node and view the results.
I. Which variables are included in the final model? Explain what this means to the vehicle sales organization (very briefly).

II. What is the validation ASE? What does this mean in a predictive model?

4. Model Comparison and Scoring

a. Use the model comparison to compare and contrast the results from the decision trees and regression based analysis. Describe and justify how you ascertained the better model.

b. Compare and contrast the best model selection for the car brand you selected. Would it have been sufficient to use only one modeling technique (decision tree or regression)? Provide justifications for your answer.

c. Use the scoring data sets to score the best model for your vehicle brand. Explain the output using plots.

PART B:

As an important extended step in the predictive modelling process, Beta Management is interested in comparing and contrasting different predictive models specific to different car brands.

In this exercise, your team is expected to carry out a comprehensive analysis based on the predictive models created for the four car brands provided (Toyota, Honda, BMW, Mercedes). As a team you have to compare and contrast the outcomes of predictive models for the four car brands and create a report for Beta Management.

You may consider the following points:

a. Which variables were used in the predictive models to determine the price of the four brands? Discuss further comparing the feature importance of four brands (eg: What do the differences in features/variables say about the buyers interest in different models?).

b. Compare the best selected predictive models for the four brands. Do you recommend decision trees or regression? Outline reasons.

c. As a team of data analysts, what are your suggestions to improve the car price prediction models of Beta Management?

Attachment:- Assignment_Data.zip

Reference no: EM132643438

Questions Cloud

What is the most expensive car you can afford : What is the most expensive car you can afford if you finance it for 48 months? Do not round intermediate calculations. Round your answer to the nearest cent.

Which does not represent interaction between ppbe process : Which does not represent interaction between the PPBE process and the defence acquisition system? determining the affordability of an acquisition program

What is the place of ethical principles-behaviours : What is the place of ethical principles, behaviours and codes in the financial services industry?

Basic functions used in encryption algorithms : What are the essential ingredients of a symmetric cipher? What are the two basic functions used in encryption algorithms? What is triple encryption?

BUS5PA Predictive Analytics Assignment : BUS5PA Predictive Analytics Assignment Help and Solution, La Trobe University - Assessment Writing Service - Building and Evaluating Predictive Models

Which statements concerning the statement of objectives true : Which statements concerning the statement of objectives are true? it may be derived in part from a capabilities document for a system

How proposal would affect dixon ltd financial statement : Advising the board on how the proposal should be accounted for under the Financial Reporting Standards and how such a proposal would affect Dixon Ltd?

Industry experts believe blockchain is technology : Industry experts believe blockchain is a technology that has the potential to affect the business of most IT professionals in the next five years.

Foreign exchange risk and cost of borrowing swiss francs : Foreign Exchange Risk and the Cost of Borrowing Swiss Francs. The chapter demonstrated that a fir borrowing in a foreign currency could potentially end up payin

Reviews

len2643438

9/25/2020 2:04:38 AM

Hello, In this assignment, you have to do the Part A. The data set has 4 different data for 4 different brands. we have to select one brand. Please do not select BMW as another group member has selected that. You can choose from the other 3 and let me know which one you are choosing as I need to tell the other group member as well. BUS5PA: In this assignment please select the data set for Toyota brand data from the folder. Please follow the questions for the report. This assignment only includes SAS Enterprise Miner along with the report. Let me know if you have any questions

Write a Review

Required(*) Message

User Account

All Pages