BUS5PA Predictive Analytics Assignment

Assignment Help Other Subject
Reference no: EM132643438

BUS5PA Predictive Analytics - La Trobe University

Assignment - Building and Evaluating Predictive Models using SAS Enterprise Miner

Objective:

a) Demonstrate knowledge of building different types of predictive models using SAS Enterprise Miner
b) Demonstrate skill and knowledge in applying predictive models in a real-life predictive analytics task
c) Relate theoretical knowledge of predictive models and best practices to application scenarios

Business Case - Predictive Model for Vehicle Price Prediction

Beta (Pvt) Ltd is an Australian online car sales platform for providing an effective car buying and selling service. In order to help boost sales transactions, the management of beta is in the process of building a car price estimation system to help second-hand car sellers to sell their cars at the best price.

Beta management is very keen to trial predictive modeling for this task and has gathered a historic car sales dataset from a publicly available data repository. The dataset contains 21 variables describing previously sold cars. The attributes include the selling price of cars, year, odometer reading, fuel type, condition, location, etc. The list of attributes and their descriptions are given below.

Variable

Description

id

Unique Id of the record

region

Region

price

Price

year

Launch year

manufacturer

Manufacturer

model

Model

condition

Overall condition of the vehicle

cylinders

Number of cylinders

fuel

Fuel type

odometer

Odometer reading

title_status

Condition - whether the vehicle is free from accidents/ repaired/ rebuilt etc.

transmission

Transmission type

vin

Vehicle identification number

drive

Drive type

size

Size of the vehicle

type

Vehicle type

paint_color

Paint color

county

County

state

State

lat

Lat

long

Long

The management of Beta.com Ltd. is considering you as an external consulting group to outsource the task to develop a reliable predictive model to predict the selling price of the cars, using the aforementioned historic dataset. Beta has provided you with a sample data sets of BMW, Mercedes, Toyota and Honda cars to build separate price-prediction models. They also wish to compare and contrast the pricing models of these four car brands.

PART A

You have to select one dataset from the four datasets provided. (Each group member should have a different dataset). Based on the selected vehicle dataset, you are required to build different predictive models, compare and contrast which is the best model for the selected dataset. You are also provided with a scoring dataset which you need to use for price prediction.

1. Setting up the project and exploratory analysis
a. Create a new project and create a data source based on the selected dataset.
b. Carry out a data exploration by using a StatExplore Node. Explain your findings with regard to your vehicle dataset.
c. Create a Data Partition with 70% of the data for training and 30% for validation.

2. Decision tree-based modeling and analysis
Carry out the following modeling tasks for the selected vehicle dataset.
a. Create two Decision Tree models. Use two-way and three-way splits to create the two separate decision tree models.
For each decision tree,
I. How many leaves are in the optimal tree?
II. Which variable was used for the first split?
III. What were the competing splits for this first split?
b. Which of the decision tree models appears to be better? Justify your answer.
c. Refer to the selected decision tree model in part (b) and
I. Identify leaf nodes which have good predictive performance (two leaf nodes) and poor predictive performance (two leaf nodes).

II. Provide justifications for your selections
III. Write down the rules for the pathways leading up to each selected leaf node.

Regression-based modeling and analysis

a. In preparation for regression, is any missing values imputation needed? If yes, should you do this imputation before generating the decision tree models? Why or why not?
b. Use an Impute node connected to Data Partition node to handle missing values. Which variables have been imputed?
c. Are there any ordinal variables? Use the replacement node to assign relevant values.
d. Conduct data exploration to select the best variables for the model. Explain your findings.
Hint: You can connect the ‘Variable Selection' node in the ‘Explore' tab to the datasource and observe which variables have been picked as the selected variables. To manually change the variables, go to ‘Manual Selection' in the properties panel and adjust role of the variables.

e. Create a Regression model using the set of variables you identified as suitable in part c. You can choose the stepwise selection and use validation error as the selection criterion.
f. Run the Regression node and view the results.
I. Which variables are included in the final model? Explain what this means to the vehicle sales organization (very briefly).

II. What is the validation ASE? What does this mean in a predictive model?

4. Model Comparison and Scoring

a. Use the model comparison to compare and contrast the results from the decision trees and regression based analysis. Describe and justify how you ascertained the better model.

b. Compare and contrast the best model selection for the car brand you selected. Would it have been sufficient to use only one modeling technique (decision tree or regression)? Provide justifications for your answer.

c. Use the scoring data sets to score the best model for your vehicle brand. Explain the output using plots.

PART B:

As an important extended step in the predictive modelling process, Beta Management is interested in comparing and contrasting different predictive models specific to different car brands.

In this exercise, your team is expected to carry out a comprehensive analysis based on the predictive models created for the four car brands provided (Toyota, Honda, BMW, Mercedes). As a team you have to compare and contrast the outcomes of predictive models for the four car brands and create a report for Beta Management.

You may consider the following points:

a. Which variables were used in the predictive models to determine the price of the four brands? Discuss further comparing the feature importance of four brands (eg: What do the differences in features/variables say about the buyers interest in different models?).

b. Compare the best selected predictive models for the four brands. Do you recommend decision trees or regression? Outline reasons.

c. As a team of data analysts, what are your suggestions to improve the car price prediction models of Beta Management?

Attachment:- Assignment_Data.zip

Reference no: EM132643438

Questions Cloud

What is the most expensive car you can afford : What is the most expensive car you can afford if you finance it for 48 months? Do not round intermediate calculations. Round your answer to the nearest cent.
Which does not represent interaction between ppbe process : Which does not represent interaction between the PPBE process and the defence acquisition system? determining the affordability of an acquisition program
What is the place of ethical principles-behaviours : What is the place of ethical principles, behaviours and codes in the financial services industry?
Basic functions used in encryption algorithms : What are the essential ingredients of a symmetric cipher? What are the two basic functions used in encryption algorithms? What is triple encryption?
BUS5PA Predictive Analytics Assignment : BUS5PA Predictive Analytics Assignment Help and Solution, La Trobe University - Assessment Writing Service - Building and Evaluating Predictive Models
Which statements concerning the statement of objectives true : Which statements concerning the statement of objectives are true? it may be derived in part from a capabilities document for a system
How proposal would affect dixon ltd financial statement : Advising the board on how the proposal should be accounted for under the Financial Reporting Standards and how such a proposal would affect Dixon Ltd?
Industry experts believe blockchain is technology : Industry experts believe blockchain is a technology that has the potential to affect the business of most IT professionals in the next five years.
Foreign exchange risk and cost of borrowing swiss francs : Foreign Exchange Risk and the Cost of Borrowing Swiss Francs. The chapter demonstrated that a fir borrowing in a foreign currency could potentially end up payin

Reviews

len2643438

9/25/2020 2:04:38 AM

Hello, In this assignment, you have to do the Part A. The data set has 4 different data for 4 different brands. we have to select one brand. Please do not select BMW as another group member has selected that. You can choose from the other 3 and let me know which one you are choosing as I need to tell the other group member as well. BUS5PA: In this assignment please select the data set for Toyota brand data from the folder. Please follow the questions for the report. This assignment only includes SAS Enterprise Miner along with the report. Let me know if you have any questions

Write a Review

Other Subject Questions & Answers

  Create a set of z schemas that adequately describes the ccs

ITECH7410 - Software Engineering Methodologies Assignment - Formal System Specification, Federation University, Australia. Create a set of Z schemas

  What are some strategies for increasing the likelihood

What are some strategies for increasing the likelihood that adolescents will thrive and be free from problems like high school drop out, pregnancy.

  Explain the step in the process

Pick a manufacturing industry that is transportation and apply data mining to an individual step in the process. Please also address decision tree classifiers.

  What is the difference between a dnp and a phd in nursing

Discuss current research that links patient safety outcomes to ADN and BSN nurses. Based on some real-life experiences, do you agree or disagree with this research?

  Discuss an effective public health structure

It is unfortunate that it requires a new threat or epidemic to halt the demise of organized public health and restore an effective public health structure

  How many calories do one pound of fat represent

How many calories do one pound of fat represent? What happens when a person goes on a very low calorie diet?

  MATH4004 Mathematics for Computing Assignment

MATH4004 Mathematics for Computing Assignment Help and Solution, Oxford Brookes University - Assessment Writing Service

  Identify and describe the current community resources

Rsearch the local organizations and resources currently providing services to address your identified human services issue or problem.

  Explore the kids count data center or us census data

Explore the Kids Count Data Center or U.S. Census Data. Provide a 1-2 paragraph summary of your area and your findings.

  Compares the written and film versions

Compose an essay of at least 750 words that compares the written and film versions of The Importance of Being Earnest.

  Define organizational impact of national healthcare issues

Summarize the strategies used to address the organizational impact of national healthcare issues/stressors presented in the scholarly resources you selected.

  Defining both your credibility and your specific goal

Identity your chosen speech topic, defining both your credibility and your specific goal. Identify reputable and scholarly library sources to use as support

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd