Reference no: EM131564843
Part -1:
Consider the Boston Housing Data file (The schema of the data file is given on page 33 in Table 2.2 of the textbook. )
a. Study the Neural Networks Prediction
b. Using XLMINER's neural network routine under predict menu to fit a model using XLMINER default values for neural network parameters by using the predictors such as CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B, LSTAT to predict the value of the outcome variable MEDV.
i. Record the RMS errors for the training data and the validation data, and observe the lift charts for repeating the process, changing the number of epochs to 300, 3000, 10,000, 20,000.
ii. What happens to RMS error for the training data set as the number of epochs increases?
iii. What happens to RMS error for the validation data set as the number of epochs increases?
iv. Comments on the appropriate number of epochs for the model.
Note: (Please use the Prediction Option of the Neural Network in order to get RMS error)
c. Please submit your execution results and answers included in MS Excel file
Note:
1. The file BostonHousing.xls is posted along Written Assignment #3B, and description of columns are given in the file.
2. The cloud based XLMiner
3. For the Windows based XLMiner, please check the XLMiner download instruction posted in Discussion in Blackboard
Part -2:
QUESTION 1
Which of the following expression is used for the Naive Bayes classifier?
a.
b.
c.
d.
QUESTION 2
For the given classification tree, please match corresponding rules with the number in each branch.
IF age = "<=30" AND student = "no" THEN buys_computer = "no"
IFage = ">40" AND credit_rating = "fair" THEN buys_computer = "yes"
IF age = ">40" AND credit_rating = "excellent" THEN buys_computer = "no"
IF age = "<=30" AND student = "yes" THEN buys_computer = "yes"
IF age = "31...40" THEN buys_computer = "yes"
A. 1
B. 5
C. 4
D. 2
E. 3
QUESTION 3
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the prior probability P(Prior Legal Trouble = 'No') in decimal format.
QUESTION 4
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the conditional probability P(x2= Large |C1) = P(Size = Large| Fraudulent) in decimal format.
QUESTION 5
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the conditional probability P(x2= Small | C2 ) = P(Company Size = Small| Truthful) in decimal format.
(Please keep 3 digits after the decimal point)
QUESTION 6
Which of the following statement(s) is(are) correct?
a. The Naive Bayes method is a supervised learning method.
b. The Naive Bayes can be only used for classification, but not for prediction.
c. The Naive Bayes method is a data driven method.
d. The Naive Bayes uses cut-off value for calculated posterior probability to determine the class label of a given testing sample.
QUESTION 7
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
For the given instance with the input such as prior legal trouble = Yes, company size = Large, please determine if the company is truthful or not.
(If it is truthful, select True, otherwise, select False.)
True
False
QUESTION 8
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the conditional probability P(x1 = No| C2 ) = P(Prior Legal Trouble =No| Truthful) in decimal format.
(Please keep 3 digits after the decimal point)
QUESTION 9
Which of the following statement(s) is(are) correct?
a. Neural network model can be used for classification.
b. Neural network model can be used for prediction.
c. Both a. and b.
d. Neither a. nor b.
QUESTION 10
Which of the following statement(s) is(are) correct?
a. Fully-grown classification tree may lead to overfitting problem.
b. Overly-pruned classification tree may lead to underfitting problem.
c. Both a. and b.
d. Neither a. nor b.
QUESTION 11
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the prior probability P(Company Size ='Small') in decimal format.
QUESTION 12
The difference(s) between the basic K-Nearest Neighbor classifier and the Naive Bayes classifier is(are)
a. The basic K-Nearest Neighbor classifier uses the majority voting (prior probability value) and the posterior probability to determine the class label of a given testing sample; and Naive Bayes classifier uses only prior probability to determine the class label of a given testing sample.
b. The basic K-Nearest Neighbor classifier uses the majority voting (prior probability value) to determine the class label of a given testing sample; and Naive Bayes classifier uses not only the prior probability, but also the posterior probability to determine the class label of a given testing sample.
c. The basic K-Nearest Neighbor classifier uses the majority voting (prior probability value) to determine the class label of a given testing sample; and Naive Bayes classifier uses only the posterior probability to determine the class label of a given testing sample.
d. The basic K-Nearest Neighbor classifier uses the posterior probability to determine the class label of a given testing sample; and Naive Bayes classifier uses only the prior probability to determine the class label of a given testing sample.
QUESTION 13
What is(are) the ingredient(s) by which the neural net evolves to produce a more accurate prediction?
a. weight updates
b. learning rate
c. learning algoirthm
d. momentum
QUESTION 14
In general, the CART does have to impute values or delete observations with missing values in order to handling missing data.
True
False
QUESTION 15
A CART consists of
a. the root node
b. internal nodes and leaf nodes
c. edges connecting the nodes
d. All of a., b., and c.
QUESTION 16
Which of the following defines the confidence of an association rule?
a.
b.
c.
d.
QUESTION 17
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the conditional probability P(x2= Large | C2 ) = P(Company Size = Large| Truthful) in decimal format.
(Please keep 3 digits after the decimal point)
QUESTION 18
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
For the given instance with the input such as prior legal trouble = Yes, company size = Small, please determine if the company is truthful or not.
(If it is truthful, select True, otherwise, select False.)
True
False
QUESTION 19
Which of the following defines the support of an association rule?
a.
b.
c.
d.
QUESTION 20
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the conditional probability P(x2= Small |C1) = P(Size = Small| Fraudulent) in decimal format.
QUESTION 21
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
For the given instance with the input such as prior legal trouble = No, company size = Small, please determine if the company is truthful or not.
(If it is truthful, select True, otherwise, select False.)
True
False
QUESTION 22
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
For the given instance with the input such as prior legal trouble = No, company size = Large, please determine if the company is truthful or not.
(If it is truthful, select True, otherwise, select False.)
True
False
QUESTION 23
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the conditional probability P(x1 = Yes |C1) = P(Prior Legal Trouble =Yes| Fraudulent) in decimal format.
QUESTION 24
In general, the CART is not sensitive to the outliers.
True
False
QUESTION 25
Which of the following statement(s) is(are) correct?
a. There is only one root node in each CART
b. Each node in CART has only one direct parent node.
c. Each leaf node has no child node(s).
d. All of a., b., and c.
QUESTION 26
Which of the following defines the benchmark confidence of an association rule?
a.
b.
c.
d.
QUESTION 27
Which of the following statement(s) is(are) correct?
a. Each node in a classification tree is corresponding to a column in a data table.
b. Each node in a classification tree is corresponding to a dimension in terms of multi-dimensional data space.
c. Each node in a classification tree defines a decision boundary (or split condition) along its corresponding dimension.
d. All of a., b., and c.
QUESTION 28
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the prior probability P(Company Size ='Large') in decimal format.
QUESTION 29
The CART can be used for the purpose(s) of
a. Classification
b. Prediction
c. Either a. or b.
d. Both a. and b.
QUESTION 30
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the conditional probability P(x1 = Yes | C2) = P(Prior Legal Trouble =Yes| Truthful) in decimal format.
(Please keep 3 digits after the decimal point)
QUESTION 31
Which of the following statement(s) is(are) correct?
a. In XLMiner, the Naive Bayes Classifier can take only the category variables as input to generates the category response or class label.
b. In general, the Naive Bayes Classifier can take not only the category variables as input, but also the continuous variables to generates the category response or class label.
c. Both a. and b.
d. Neither a. nor b.
QUESTION 32
The momentum added in weight update during neural network training process
a. can keep weights changing in the same direction of they did in the preceding interaction.
b. will be reluctant to learn from data that want to change the direction of the weights when the momentum values are set high.
c. can help avoid getting stuck in a local optimum.
d. can help keep the neural network learning process converge to optimum.
QUESTION 33
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the prior probability P(C2) = P(Truthful) in decimal format.
QUESTION 34
What is the meaning of CART in this data mining textbook?
a. Classification, Assertion, Regression, and Translation.
b. Categorization, Assertion, Regression, and Translation.
c. Category and Regression Trees
d. Classification and Regression Trees
QUESTION 35
In CART, it is necessary to normalize the data in the unit range 0 to 1.
True
False
QUESTION 36
Which of the following statement(s) is(are) correct about the CART?
a. For classification, the path from the root node to the leaf node represents a specific decision rule condition, and the majority voting at the leaf node will be used to determine the class label designed by the path.
b. For predicting, the path from the root node to the leaf node represents a specific decision rule condition, and the calculated average value of the variable at the leaf node will be used to predict its value.
c. Both a. and b.
d. Neither a. nor b.
QUESTION 37
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the conditional probability P(x1 = No |C1) = P(Prior Legal Trouble = No| Fraudulent) in decimal format.
QUESTION 38
To build a good classifier, the inductive learning algorithm or classification tree construction algorithm requires a large data set.
True
False
QUESTION 39
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the prior probability P(C1) =P(Fraudulent) in decimal format.
QUESTION 40
For the given table below,
Input Variables
|
Decision Variables
|
Prior Legal Trouble
|
Company Size
|
Status
|
Y
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Large
|
Truthful
|
N
|
Small
|
Truthful
|
N
|
Small
|
Truthful
|
Y
|
Small
|
Fraud
|
Y
|
Large
|
Fraud
|
N
|
Large
|
Fraud
|
Y
|
Large
|
Fraud
|
Please give the prior probability P(Prior Legal Trouble = 'Yes') in decimal format.
QUESTION 41
Multi-layer feedforward neural network consists of
a. Input layer
b. Hidden layer(s)
c. Output layer
d. All of a., b., and c.
Attachment:- BostonHousing.xls