Solution-What happens to rms error for the training data set

What happens to rms error for the training data set

Assignment Help Data Structure & Algorithms

Reference no: EM131564843

Part -1:

Consider the Boston Housing Data file (The schema of the data file is given on page 33 in Table 2.2 of the textbook. )

a. Study the Neural Networks Prediction

b. Using XLMINER's neural network routine under predict menu to fit a model using XLMINER default values for neural network parameters by using the predictors such as CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B, LSTAT to predict the value of the outcome variable MEDV.

i. Record the RMS errors for the training data and the validation data, and observe the lift charts for repeating the process, changing the number of epochs to 300, 3000, 10,000, 20,000.

ii. What happens to RMS error for the training data set as the number of epochs increases?

iii. What happens to RMS error for the validation data set as the number of epochs increases?

iv. Comments on the appropriate number of epochs for the model.

Note: (Please use the Prediction Option of the Neural Network in order to get RMS error)

c. Please submit your execution results and answers included in MS Excel file

Note:

1. The file BostonHousing.xls is posted along Written Assignment #3B, and description of columns are given in the file.
2. The cloud based XLMiner
3. For the Windows based XLMiner, please check the XLMiner download instruction posted in Discussion in Blackboard

Part -2:

QUESTION 1

Which of the following expression is used for the Naive Bayes classifier?

QUESTION 2

For the given classification tree, please match corresponding rules with the number in each branch.

IF age = "<=30" AND student = "no" THEN buys_computer = "no"

IFage = ">40" AND credit_rating = "fair" THEN buys_computer = "yes"

IF age = ">40" AND credit_rating = "excellent" THEN buys_computer = "no"

IF age = "<=30" AND student = "yes" THEN buys_computer = "yes"

IF age = "31...40" THEN buys_computer = "yes"

A. 1
B. 5
C. 4
D. 2
E. 3

QUESTION 3

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the prior probability P(Prior Legal Trouble = 'No') in decimal format.

QUESTION 4

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the conditional probability P(x2= Large |C1) = P(Size = Large| Fraudulent) in decimal format.

QUESTION 5

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the conditional probability P(x2= Small | C2 ) = P(Company Size = Small| Truthful) in decimal format.
(Please keep 3 digits after the decimal point)

QUESTION 6

Which of the following statement(s) is(are) correct?

a. The Naive Bayes method is a supervised learning method.

b. The Naive Bayes can be only used for classification, but not for prediction.

c. The Naive Bayes method is a data driven method.

d. The Naive Bayes uses cut-off value for calculated posterior probability to determine the class label of a given testing sample.

QUESTION 7

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

For the given instance with the input such as prior legal trouble = Yes, company size = Large, please determine if the company is truthful or not.
(If it is truthful, select True, otherwise, select False.)

True
False

QUESTION 8

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the conditional probability P(x1 = No| C2 ) = P(Prior Legal Trouble =No| Truthful) in decimal format.
(Please keep 3 digits after the decimal point)

QUESTION 9

Which of the following statement(s) is(are) correct?

a. Neural network model can be used for classification.

b. Neural network model can be used for prediction.

c. Both a. and b.

d. Neither a. nor b.

QUESTION 10

Which of the following statement(s) is(are) correct?

a. Fully-grown classification tree may lead to overfitting problem.

b. Overly-pruned classification tree may lead to underfitting problem.

c. Both a. and b.

d. Neither a. nor b.

QUESTION 11

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the prior probability P(Company Size ='Small') in decimal format.

QUESTION 12

The difference(s) between the basic K-Nearest Neighbor classifier and the Naive Bayes classifier is(are)

a. The basic K-Nearest Neighbor classifier uses the majority voting (prior probability value) and the posterior probability to determine the class label of a given testing sample; and Naive Bayes classifier uses only prior probability to determine the class label of a given testing sample.

b. The basic K-Nearest Neighbor classifier uses the majority voting (prior probability value) to determine the class label of a given testing sample; and Naive Bayes classifier uses not only the prior probability, but also the posterior probability to determine the class label of a given testing sample.

c. The basic K-Nearest Neighbor classifier uses the majority voting (prior probability value) to determine the class label of a given testing sample; and Naive Bayes classifier uses only the posterior probability to determine the class label of a given testing sample.

d. The basic K-Nearest Neighbor classifier uses the posterior probability to determine the class label of a given testing sample; and Naive Bayes classifier uses only the prior probability to determine the class label of a given testing sample.

QUESTION 13

What is(are) the ingredient(s) by which the neural net evolves to produce a more accurate prediction?

a. weight updates

b. learning rate

c. learning algoirthm

d. momentum

QUESTION 14

In general, the CART does have to impute values or delete observations with missing values in order to handling missing data.
True
False

QUESTION 15

A CART consists of

a. the root node

b. internal nodes and leaf nodes

c. edges connecting the nodes

d. All of a., b., and c.

QUESTION 16

Which of the following defines the confidence of an association rule?

QUESTION 17

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the conditional probability P(x2= Large | C2 ) = P(Company Size = Large| Truthful) in decimal format.
(Please keep 3 digits after the decimal point)

QUESTION 18

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

For the given instance with the input such as prior legal trouble = Yes, company size = Small, please determine if the company is truthful or not.

(If it is truthful, select True, otherwise, select False.)

True
False

QUESTION 19

Which of the following defines the support of an association rule?

QUESTION 20

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the conditional probability P(x2= Small |C1) = P(Size = Small| Fraudulent) in decimal format.

QUESTION 21

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

For the given instance with the input such as prior legal trouble = No, company size = Small, please determine if the company is truthful or not.

(If it is truthful, select True, otherwise, select False.)

True
False

QUESTION 22

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

For the given instance with the input such as prior legal trouble = No, company size = Large, please determine if the company is truthful or not.

(If it is truthful, select True, otherwise, select False.)

True
False

QUESTION 23

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the conditional probability P(x1 = Yes |C1) = P(Prior Legal Trouble =Yes| Fraudulent) in decimal format.

QUESTION 24

In general, the CART is not sensitive to the outliers.
True
False

QUESTION 25

Which of the following statement(s) is(are) correct?

a. There is only one root node in each CART

b. Each node in CART has only one direct parent node.

c. Each leaf node has no child node(s).

d. All of a., b., and c.

QUESTION 26

Which of the following defines the benchmark confidence of an association rule?

QUESTION 27

Which of the following statement(s) is(are) correct?

a. Each node in a classification tree is corresponding to a column in a data table.

b. Each node in a classification tree is corresponding to a dimension in terms of multi-dimensional data space.

c. Each node in a classification tree defines a decision boundary (or split condition) along its corresponding dimension.

d. All of a., b., and c.

QUESTION 28

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the prior probability P(Company Size ='Large') in decimal format.

QUESTION 29

The CART can be used for the purpose(s) of

a. Classification

b. Prediction

c. Either a. or b.

d. Both a. and b.

QUESTION 30

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the conditional probability P(x1 = Yes | C2) = P(Prior Legal Trouble =Yes| Truthful) in decimal format.
(Please keep 3 digits after the decimal point)

QUESTION 31

Which of the following statement(s) is(are) correct?

a. In XLMiner, the Naive Bayes Classifier can take only the category variables as input to generates the category response or class label.

b. In general, the Naive Bayes Classifier can take not only the category variables as input, but also the continuous variables to generates the category response or class label.

c. Both a. and b.

d. Neither a. nor b.

QUESTION 32

The momentum added in weight update during neural network training process

a. can keep weights changing in the same direction of they did in the preceding interaction.

b. will be reluctant to learn from data that want to change the direction of the weights when the momentum values are set high.

c. can help avoid getting stuck in a local optimum.

d. can help keep the neural network learning process converge to optimum.

QUESTION 33

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the prior probability P(C2) = P(Truthful) in decimal format.

QUESTION 34

What is the meaning of CART in this data mining textbook?

a. Classification, Assertion, Regression, and Translation.

b. Categorization, Assertion, Regression, and Translation.

c. Category and Regression Trees

d. Classification and Regression Trees

QUESTION 35

In CART, it is necessary to normalize the data in the unit range 0 to 1.
True
False

QUESTION 36

Which of the following statement(s) is(are) correct about the CART?

a. For classification, the path from the root node to the leaf node represents a specific decision rule condition, and the majority voting at the leaf node will be used to determine the class label designed by the path.

b. For predicting, the path from the root node to the leaf node represents a specific decision rule condition, and the calculated average value of the variable at the leaf node will be used to predict its value.

c. Both a. and b.

d. Neither a. nor b.

QUESTION 37

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the conditional probability P(x1 = No |C1) = P(Prior Legal Trouble = No| Fraudulent) in decimal format.

QUESTION 38

To build a good classifier, the inductive learning algorithm or classification tree construction algorithm requires a large data set.
True
False

QUESTION 39

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the prior probability P(C1) =P(Fraudulent) in decimal format.

QUESTION 40

For the given table below,

Input Variables		Decision Variables
Prior Legal Trouble	Company Size	Status
Y	Small	Truthful
N	Small	Truthful
N	Large	Truthful
N	Large	Truthful
N	Small	Truthful
N	Small	Truthful
Y	Small	Fraud
Y	Large	Fraud
N	Large	Fraud
Y	Large	Fraud

Please give the prior probability P(Prior Legal Trouble = 'Yes') in decimal format.

QUESTION 41

Multi-layer feedforward neural network consists of

a. Input layer

b. Hidden layer(s)

c. Output layer

d. All of a., b., and c.

Attachment:- BostonHousing.xls

Verified Expert

The solution has been written prepare in Microsoft office excel. The paper is about to determine data mining process output from the given data. For obtain data mining output, error and R value, it need to require perform excel formulation with help of XLMiner add on use in excel. This add on of Microsoft excel is specifically used for data mining program. From this data, the miner can forecast for further planning and implement. The solution is obtain by using Microsoft excel-2013 sheet.

Reference no: EM131564843

Questions Cloud

What is a voluntary response sample : Voluntary Response Sample Some magazines and newspapers conduct polls in which the sample results are a voluntary response sample.

Research essay on entrepreneurship : Research essay on Entrepreneurship/ Identifying and Analyzing Domestic and International Opportunities.

Process models of our criminal justice system : Considering the crime control and due process models of our criminal justice system, is it possible to achieve a balance.

Who conducts the training and how often is it delivered : Who conducts the training and how often is it delivered? Who receives the training in the organization and what are the topics?

What happens to rms error for the training data set : Master of Management of Information Systems MMIS 643 Data Mining - ii. What happens to RMS error for the training data set as the number of epochs increases?

Calculate the overhead cost per unit for each product : Weber Industries has three activity cost pools and two products. Prepare schedule showing the calculations of the activity-based overhead rates per cost driver

Draw a block diagram for the controlsystem : Consider the stirred-tank heating system shown in Fig. E. It is desired to control temperature T2 by adjusting the heating rate Q1 (Btu/h) via voltage signal.

Explain what the data in each column represents : A detailed description of the data used in this analysis - specifically explain what the data in each column represents.

Professional communications : One of the main topics for this week is professional communications.In the business world, this means email, technical writing, memos, and so on.

User Account

All Pages

What happens to rms error for the training data set

Reference no: EM131564843

Reference no: EM131564843

Questions Cloud

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

Implement an open hash table

Use a search tree to find the solution

How to access virtualised applications through unicore

Recursive tree algorithms

Determine the mean salary as well as the number of salaries

Currency conversion development

Cloud computing assignment

Design a gui and implement tic tac toe game in java

Recursive implementation of euclids algorithm

Data structures for a single algorithm

Write the selection sort algorithm

Design of sample and hold amplifiers for 100 msps by using n

Assured A++ Grade

Academics

Major Subjects

Majors

Get In Touch

TERMS & POLICIES

HELP & SUPPORT