Perform a k-nearest neighbors prediction

Assignment Help Basic Statistics
Reference no: EM132397534

Assignment

Written Assignment #2B requires hands-on practice of XLMiner, and you are expected to use XLMiner to mine the Boston Housing data in the file BostonHousing.xls posted in Written Assignment #2B entry in Blackboard

Your task is to run k-nearest neighbors algorithm in XLMiner for both prediction and classification tasks describe below, and submit your answer with your XLMiner execution result files attached in your submission. Since the k-nearest neighbor algorithm can be used for both classification and prediction, there are two menus under XLMiner, Classify and Predict.

The file BostonHousing.xls contains information on over 500 census tracts in Boston, where for each tract 14 variable values are recorded. The last column (CAT.MEDV) was derived from MEDV, such that it obtains the value 1 if MEDV>30 and 0 otherwise. Consider the goal of predicting and classifying the median value (MEDV and CAT.MEDV) of a tract, given the information in the first 13 columns (input variables) in the column list. Partition the data into training (60%) and validation (40%) sets. (For description of the column names in BostonHousing.xls, please make reference to Table 2.2 on page 33 of the textbook)

1. Under Predict menu in XLMiner, perform a k-nearest neighbors prediction with all the predictors from column A (CRIM) to column B (LSTAT) (excluding the CAT.MEDV, the CAT.MEDV column is the outcome or decision variable for classification) for both training data set and validation data set, trying values of k from 1 to 10 to predict the value MEDV. What is the best k chosen? What does it mean? Also attach the execution result file including RMSE (Root Mean Square Errors) in your submission. (you can try run prediction with normalizing data and without normalizing data).

2. Under Classify menu in XLMiner, perform k-nearest neighbors classification with all the predictors from column A (CRIM) to column B (LSTAT) (excluding the MEDV, the MEDV column is the outcome or decision variable for prediction) for both training data set and validation data set, and find the best K for validation data set, trying values of k from 1 to 10 to classify CAT.MEDV (make sure to normalize the data). Also attach the execution result file including confusion matrix, lift chart, and ROC chart in your submission.

Attachment:- Boston Housing.rar

Reference no: EM132397534

Questions Cloud

Produce definition of data visualization : Produce a definition of data visualization. Explain how it caters to the perceptual abilities of humans.
Discussing the safe harbor provisions under hipaa : Write an essay of at least 500 words discussing the Safe Harbor provisions under HIPAA. Write in essay format not in outline, bulleted, numbered or other.
Develop metrics and measure results : In order to have a successful IG program, one of the eight (8) Information Risk Planning and Management step is to develop metrics and measure results.
Discussion pertaining to the key performance indicators : Description regarding the metrics your team will use to measure performance. discussion pertaining to the key performance indicators (KPIs).
Perform a k-nearest neighbors prediction : Perform a k-nearest neighbors prediction with all the predictors from column A to column B for both training data set and validation data set, trying values.
Cloud computing and data forensics : You have been assigned to investigate whether or not employee at local hospital has been accessing patient records.
Essay on hacking manufacturing systems : Write two page single space essay on hacking manufacturing systems. Recent hacks happened for the automotive industry. How to secure their infrastructure
Accessing patient records and setting information : You have been assigned to investigate whether or not an employee at a local hospital has been accessing patient records and setting information
Data analyst capstone course project : Build a machine learning model to test and do prediction and Build a machine learning model and test it with the Test set values dataset

Reviews

Write a Review

Basic Statistics Questions & Answers

  Make a table showing the frequency

Some students in introductory statistics courses were asked to select a number between 1 and 30 (inclusive). The results are in the number variable

  Interest concerning public opinion of the war in iraq

Before the 2004 presidential election in the United States, there was a great deal of interest concerning public opinion of the war in Iraq

  Question find the probability that in 200 tosses of a fair

question find the probability that in 200 tosses of a fair die we will obtain at least 40

  Example of a deficient criterion measure

What would be an example of a deficient criterion measure for servers at restaurants? and What would be an example of a contaminated criterion measure.

  The researchers found that of 472 mechanically ventilated

a study looks at patients who were mechanically ventilated in the intensive care unit. the researchers found that of

  Standard deviation for the ratings of widget

Find the expected value (mean), variance, standard deviation for the ratings of your widget. What percent of individuals rated your product as 2 or below?

  Calculate the values of cs with trade and ps with trade

Given this information, analyze the effect on Romia of opening its piano market to trade. In your answer be sure to comment on how this decision will impact imports or exports of pianos in Romia while also commenting on how many pianos domestic consu..

  What price shoukd the company expect its existing shares

What price shoukd the company expect its existing shares shares to sell immediately after the announcement? (I'm not supposed to round intermediate calculations

  Scatter plot for the variables

A store manager wishes to find out whether there is a relationship between the age of her employees and the number of sick days they take each year. The data for the sample are shown below:

  Contain a defective computer

What is the probability that a sample of 4 of the 8 computers will not contain a defective computer?

  Required condition for being a probability distribution

Show that it satisfies the required condition for being a probability distribution.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd