Evaluate menu to detemine which is the best tool

Assignment Help Other Subject
Reference no: EM132655837

Classification Using Rattle

A hypothetical Melbourne suburb was surveyed with a view to its redevelopment potential. In particular there was interest in finding adjacent properties for more intensive redevelopment. Only 2887 (2.5%) of more than 45,000 properties were redeveloped between 2004 and 2009, making this a relatively rare event. Our goal is to predict which 2004 properties were redeveloped between 2004 and 2009 based on various 2004 variables and recent changes in the immediate neighbourhood. The data should be partitioned 70:15:15 for training, validation and testing. Only a few of the available variables are considered in this assignment. Please include all outputs for each question.

Names of Variables

Measureme nt

Scale

Example Property (*)

Description

DwellingsConstructed_2 00m

Interval

4

Number of dwellings constructed within 200m between 2000 and 2004

NetDwellingIncrease_20 0m

Interval

3

Increase in number of dwellings within 200m between 2000 and 2004

redevPotIndex_2004

Interval

.025

2004 assessment of redevelopment potential based on

property dimensions

strata

Binary

0

Strata housing (1=yes, 0=no)

BuildingProjects_200m

Interval

2

Number of building projects within 200m between 2000 and 2004

Demolitions_200m

Interval

1

Number of demolitionswithin

200m between 2000 and2004

Road Frontage(m)

Interval

20

Length of road frontage

Redeveloped 2004-2009

Binary

?

The response/target variable coded equal to one for properties redeveloped between 2004 and

2009, 0 otherwise.

a) The redevelop.csv data contains data for a random sample of the properties that were not redeveloped and all the properties that were redeveloped, resulting in a data set containing a total of only 7409 properties.
i) Why was only a random sample of the properties that were not redeveloped between 2004 and 2009 chosen?
ii) What else could have been done to achieve a similar effect?

b) Open R and include the rattle package. What instructions did you use to do this?

c) Read there develop.csv data in to Rattle and assign appropriate roles to your variables. Note that the partition is 70% for training, 15% for validation and 15% for testing.

What is thetargetvariable?

d) Produce suitable plots to visualise the differences in the distributions of the input variables for properties that were and were not redeveloped. Try to show at least six different types of plot.

e) Fit a classification tree for redeveloped properties assuming a loss matrix with losses half as big for a false negative (Redeveloped="No" when it should be Redevelop="Yes") as a false positive (Redeveloped="Yes" when it should be Redevelop="No"). Assume no losses when a correct decision is made. Answer the following questions after drawing your tree for the training data. Be sure to maximise your tree window before drawing your tree (again).

i. Complete the above loss matrix.

ii. What are the rules for the terminal node with the smallest errorrate?

iii. How many splits if we want to minimise the cross-validation error? Explain your answer

iv. Consider node 2 of your drawn tree. How many training observations for node 2 and what are the rules for node2?

v. At node 2 in the training data what is the average loss per property if we make a Redevelopment="Yes" decision? What is the average loss per property if we make a Redevelopment = "No" decision? Which is the better decision for this node?

vi. Repeat (v) for some other node where the better decision is unexpected. Explain why the better decision is unexpected.

f) Run a random forest with your data with 500 trees, randomly selecting three input variables from which to choose your split variable at each node. Please include all outputs for each question.

i. What is the OOB estimate of the error rate and what does OOB mean?
ii. What is the error rate for the Redevelopment = "Yes" predictions with thetestdata and what is the error rate for Redevelopment = "No" predictions with the testdata?
iii. Which are the top 3 predictor variables according to the Gini measure of variable importance and how is this measure defined?

g) Now try Boosting. Please include all outputs for each question.

i. Interpret the term Gain and explain why this measure provides a reliable measure of Variable Importance.

ii. What does the Error Plot suggest as the optimum number of trees?

h) Now try a neural network with two and then three hidden nodes. Use the Evaluate menu error matrix to answer the following questions. Please include all outputs for each question.

i. Is it necessary to transform any of the input variables? What transformations have you chosen and why?
ii. What is the error rate for properties that actually were redeveloped. Consider only the test data assuming first 2 and then 3 hidden nodes?
iii. What is the error rate for properties that were not actually redeveloped. Consider only the test data assuming first 2 and then 3 hidden nodes?
iv. Which is better a 2 hidden node or a 3 hidden node solution?Why?

i) Use the Evaluate menu to detemine which is the best tool for modelling your data; a single tree, a random forest, boosting, a neural network. Why have you chosen this one method over the other three methods?

j) For this best tool show the ROC, sensitivity, risk and lift charts for the test data ONLY.

k) Explain the axes for each of the above four charts.

l) Which is the best method for choosing the most important predictor of Redevelopment = "Yes"; plots, a single tree, a random forest, boosting, a neural network? Why have you chosen this one method over the other four methods?

m) Do any of the above models appear to be worth commercialising? For what purpose?

Attachment:- Exercise.rar

Reference no: EM132655837

Questions Cloud

How a small business could combine a concentrated : Give an example of how a small business could combine a concentrated marketing strategy
Discuss issues surrounding representativeness : Discuss the issues surrounding representativeness and ways to increase overall representativeness in state government.
What do you do when someone gets sick : Think for a while about cultural practices and how they affect health or illness in your own family. They may be difficult to identify as such at first.
What is the overhead rate per machine hour : Flawless Cosmetic Company manufactures and distributes, If Flawless changes its allocation basis to machine hours, what is the overhead rate per machine hour?
Evaluate menu to detemine which is the best tool : Evaluate menu to detemine which is the best tool for modelling your data; a single tree, a random forest, boosting, a neural network
What is the main function of legislative branch : What is the main function of the legislative branch? What role does the executive branch play in the formation of laws?
Identify the benefits and costs associated : Identify the benefits and costs associated with each option available to the government. Illustrate how each policy response will impact the marco-economy.
What the average cost of product is closest to : The company makes 410 units of product O37W a year, According to the activity-based costing system, the average cost of product O37W is closest to
Calculate south africa nominal gdp in 2018 and 2019 : Suppose that South Africa produces only two goods, sanitisers and masks. The base year is 2018 and the table below gives the quantities

Reviews

Write a Review

Other Subject Questions & Answers

  Describe step-by-step description of both interventions

Create one 7-10-slide PowerPoint presentation outlining two interventions for the case study you selected. One of the interventions must include Critical.

  The nature of physician involvement in hospital

The nature of physician involvement in hospital decision making must be understood within this context (Megan, et. al., 2014).

  How do different fields of study approach religion

How do different fields of study approach religion? What are some critical issues in the academic study of religion

  Investigate about fossil and write essay about it

fossil proved to be the link between dinosaurs and birds, letting us know that every bird you see today is actually a winged dinosaur

  Describe factors that influence a drugs effects

Write a 1,050- to 1,400-word paper that discusses the changes taking place in approaches to substance abuse strategies. Include the following: Define substance abuse. Describe factors that influence a drug's effects

  Create a system of unequals using given problem

Please share your research, experience, and thoughts: Using the ethical principle of justice, determine whether health care in this country should be a right.

  God story research paper

BIBL100- God's Story Research Paper. What is important to know about the history, culture, and/or theology surrounding this person or concept?

  Discuss the importance of christian worldview

Discuss the importance of Christian worldview to the role of the health administrator in the transformation of health care. Discuss how the Christian health.

  What are crocs core competencies

What are Croc's core competencies? How do they exploit these competencies in the future? Consider the following alternatives: Further vertical integration

  Describe the target population of racial profiling

Using appropriate research studies, describe the target population of racial profiling. Who is impacted by this issue and how are they impacted?

  Describe your strategy for bringing attention of policymaker

You and your research team have concluded that the consistent use of high-energy drinks by adolescents negatively impacts memory retention.

  ANTH0149 Creative Enterprise Assignment Problem

ANTH0149 Creative Enterprise Assignment Help and Solution - University College London, UK. Write based on business idea - Create a Digital Asset

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd