Evaluating predictive models using sas enterprise miner

Assignment Help Other Subject
Reference no: EM133534278

Predictive Analytics

Assignment - Building and Evaluating Predictive Models using SAS Enterprise Miner

Objective:
a) Demonstrate knowledge of building different types of predictive models using SAS Enterprise Miner
b) Demonstrate skill and knowledge in applying predictive models in real-life predictive analytics task
c) Relate theoretical knowledge of predictive models and best practices to application scenarios.

Business Case - Predictive Model for Property Price Prediction

A real estate company in Melbourne is in the process of updating their property (housing) price assessment method and the management of the company wants to build a property price estimation system to help sellers to sell their properties at the best price.

The company management is very keen to trial predictive modeling for this task and has gathered the historical property sales dataset. The dataset contains 18 variables describing previously sold properties. The attributes include the selling price of properties, year the property is built, year the property is sold, number of bedrooms, number of bathrooms, number of car spots, etc. The list of attributes and their descriptions are given below (a more detailed description can be found in data_description.txt).

The management of real estate company is considering you as an external consulting group to outsource the task to develop a reliable predictive model to predict the selling price of the properties, using the aforementioned historical dataset. You are required to build different predictive models, compare and contrast which is the best model for the selected dataset. You are also provided with a data set with new properties about to be listed, for which you have to predict the house prices (scoring dataset).

Q1. Setting up the project and exploratory analysis
Needs to provide a screen shot as evidence for each subsection of Q1
a. Create a new project and create a data source based on the given datasets. Set Price as the role of Target and make sure the Role and Level assigned to each variable is correct.
b. Carry out a data exploration by using a StatExplore Node. Explain your findings with regard to your property dataset.
c. Create a Data Partition with 70% of the data for training and 30% for validation.

Q2. Decision tree-based modeling and analysis
Carry out the following modeling tasks for the selected property value dataset.
a. Create two Decision Tree models based on two-way and three-way splits to create the two separate decision tree models. Provide the relevant diagrams of the Decision trees.
For each decision tree,
I. How many leaves are in the optimal tree?
II. Which variable was used for the first split?
III. What were the competing splits for this first split?
b. Which of the decision tree models appears to be better? Justify your answer.
c. Refer to the selected decision tree model in part (b) and
I. Identify two leaf nodes which have good predictive performances and two leaf nodes with poor predictive performances.
II. Provide justifications for your selections.
III. Write down the rules for the pathways leading up to each selected leaf node.

Q3. Regression-based modeling and analysis

a. In preparation for regression, is any missing values imputation needed? If yes, should you do this imputation before generating the decision tree models? Why or why not?
b. Use an Impute node connected to Data Partition node to handle missing values. Which variables have been imputed?
c. Are there any ordinal variables? Use the Replacement node to assign relevant values.
d. Conduct data exploration to select the best variables for the model with Variable Clustering node. Describe and justify how you ascertained the best variables to the model.
e. Create a Regression model using the set of variables you identified as suitable in part (d). You can choose the stepwise selection and use validation error as the selection criterion.
f. Run the Regression node and view the results.
I. Which variables are included in the final model? Explain what this means to the real estate company (very briefly).

II. What is the validation of Average Square Error (ASE) (or Mean Square error (MSE))? What does this mean in a predictive model?

4. Model Comparison and Scoring

a. Use the model comparison to compare and contrast the results from the decision trees and regression-based analysis. Provide a summary table for comparison. Describe and justify how you ascertained the better model.

b. Would it have been sufficient to use only one modeling technique (decision tree or regression)? Provide justifications for your answer. Use the outcome of 4a solutions.

c. Use the scoring data sets to score using the best predictive model. Explain the output using plots.

5. Extending current knowledge with additional reading - SEMMA

Relate the predictive analytics life cycle from your lectures, SAS diagram created in this case study and the SEMMA analytics methodology proposed by SAS. You can use diagrams with brief explanations.

(This section is based on your understanding of the flow of process diagram in this case study. The objective of this question is to get you to think deeper and ‘connect' the generic predictive analytics life cycle discussed in the lectures with the SAS specific (particular vendor and tool specific) SEMMA methodology (this is generic to SAS) and then also relate to a specific project using the SAS diagram for the project.)

Reference no: EM133534278

Questions Cloud

What are the implications of the findings : What are the implications of the findings? What does this study offer for counsellors to improve their practice? Important: Support your assertions with
What qualities would you seek in a judge who was selected : What qualities would you seek in a judge who was selected to adjudicate a dispute between labor and management?
Research method its advantages and disadvantages : Define: Experimental Research method its advantages and disadvantages Define: Correlational Research method its advantages and disadvantages
Discuss quality data showing the successful use of cannabis : Discuss quality data showing the successful use of cannabis either as a cancer treatment OR as an adjunctive treatment alongside traditional treatment.
Evaluating predictive models using sas enterprise miner : BUS5PA Predictive Analytics, La trobe university - Demonstrate knowledge of building different types of predictive models using SAS Enterprise
What are the potential consequences of academic integrity : Why is proper documentation of sources through in-text citations and reference entries in your coursework so important? What are the potential consequences
Find a total professional articles from the literature : Find a total of three professional articles from the literature or journals: one article related to ethical decision making and at least two articles related
What steps can be taken to minimize confounding caused from : Using the previous scenario suppose that you have another experimenter helping you run some of the scenarios. She frequently dresses in Goth makeup and outfits
What is the genetic basis of the disease : What is the genetic basis of the disease? What are the physiologic implications of the disease? What are the treatment options?

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd