Parametric and a non-parametric statistical learning

Assignment Help Mathematics

Reference no: EM132489263

Question 1. Provirle detailed answers to the following questions:

a) Describe the differences between a parametric and a non-parametric statistical leerning approach. What are the advantages of a parametric approach to regres- sion or classification (as opposed to a nonparanietric approach)"? What are its disadvantages?

b) A dataset has two features: a prerlictor X and a quantitative response Y. Two models are fitted to the rlata: a linear regression model Y = α₀ + α₁X + ∈, and a quartic regression model Y = β₀ + β₁X + β₂X² + β₃X³ + β₄X⁴ + ∈

Suppose that the true relationship between A and V is linear. Would we expect the training RSS for one of the models he lower than the other. would we expect them to he the same or is there not enough information to tell? What alaout for the test RSS? What if the true relationship between X and Y is not linear, but we don't know how far it is from being linear? Does the number of observations matter?

c) Suppose that some statistical learning method is used to make a prediction for the response Y for a particular value of the predictor X. Carefully describe how to estimate the stanrlard deviation of the prediction (use mathematical formalism).

Question 2. True or False? (only a short justification is required; 1 or 2 sentences at most)
a) The k-means algorithm for clustering is guaranteed to converge to a local optimum.
b) Increasing the depth of a decision tree cannot increase its training error.
c) With infinite data and infinitely fast computers, kNN is the only algorithm needed for classification tasks.
d) For datasets with high label noise (i.e. many training instances have incorrect la¬bels), boosted decision trees would generally perform better than random forests.
e) Support vector machines provide calibrated posterior classification probabilities P(y = 1|x) and P(y = -1|k).
f) In logistic regression, we model the odds ratio p/1-p s a linear function.
g) A 7NN classifier has higher variance than a 12NN classifier.
h) Using cross validation to select hyperparameters will guarantee that the model does not overfit.
i) Hierarchical clustering methods require a predefined number of clusters.
j) A random forest is an ensemble learning method that attempts to lower the bias error of decision trees.
k) The number of parameters in a parametric model is fixed, while the number of parameters in a nonparametric model grows with the amount of training data.
l) The largest eigenvalue of the covariance matrix is associated with the direction of maximum variance in the data.

Question 3. Multiple Choice Questions: select ALL answers that apply (no justification required).
a) Which of the following are true of binary classification/regression trees?
i. Bagging decision trees is likely to increase the model variance.
ii. The deeper the decision tree is, the more likely it is to overfit.
iii. They are robust to small changes in the data.
iv. Random forests are less likely to overfit than decision trees.
b) A regression tree has substantially lower validation IvISE than expected. Which of the following is likely to improve validation MSE in most real-world applications?
i. Adding quadratic features (i.e. X_iX_j, i, j = 1, ... , p) to the predictor space.
ii. Selecting a random subset of the features and using those in the regression tree.
iii. Pruning the tree, using cross-validation to decide how to prune.
iv. Normalizing each feature to have variance 1.
c) A decision tree is getting abnormally bad performance on both the training and test sets. What could be causing the problem?
i. The decision tree is too shallow.
ii. The number of features must be decreased.
iii. The model suffers from overfitting.
iv. None of the above.
d) A dataset has 3 pts: A = (0, 2), B = (0, 1), C = (1, 0). The 2-means clustering algorithm is initialized with centers at A and B. Where will the centers converge?
i. A and C.
ii. A and the midpoint of the segment BC.
iii. C and the midpoint of the segment AB.
iv. B and the midpoint of the segment AC.
e) Consider T1, a decision stump (i.e a tree with with one layer below the root) and T2, a decision tree that is grown till a maximum depth of 4 (at most 3 layers below the root). Which of the following is/are correct?
i. Bias(T1) < Bias(T2).
ii. Bias(T1) > Bias(T2).
iii. Variance(T1) < Variance(T2).
iv. Variance(T1) > Variance(T2).
f) Which of the following are true about subset selection?
i. Subset selection is not necessary in general.
ii. Ridge regression frequently eliminates some of the features.
iii. Subset selection can reduce overfitting.
iv. The number of models to train in best subset selection increases exponentially with the number of features.

g) How does the bias-variance decomposition of a ridge regression estimator compare with that of ordinary least squares regression?
i. Ridge regression has larger bias, larger variance
ii. Ridge regression has larger bias, smaller variance
iii. Ridge regression has smaller bias, larger variance
iv. Ridge regression has smaller bias, smaller variance
h) Both PCA and Lasso can be used for feature selection and/or dimension reduction. Which of the following statements are true?
i. Lasso selects a subset (potentially the full set) of the original features
ii. PCA produces features that are linear combinations of the original features
iii. PCA and Lasso both allow you to specify how many features are chosen
iv. PCA and Lasso are the same if you use a decision tree
i) Why would we use a random forest instead of a decision tree?
i. To reduce the training error.
ii. To reduce the variance of the model.
iii. To reduce the bias of the model.
iv. To obtain a model that is easier for a human to interpret.
j) The optimal Bayes decision rule with the indicator function:
i. is the best that a classifier can achieve, on average.
ii. can be computed exactly from a large sample
iii. selects the class with the greatest posterior probability
iv. produces the smallest error rate among all classifiers. 4. [3 marks] Provide a short answer to the following questions (about a paragraph each).
a) When is ridge regression preferable to LASSO regression?
b) What is the naive assumption in the naive Bayes classifier?
c) A classifier is trained on a cancer dataset, and achieves 96% accuracy on new observations. Why might this not be considered a good classifier? How could it be improved?
d) A regression model has low bias and high variance. How can it be improved?
e) How is kNN different from k-means clustering?
f) List 6 feature selection/dimension reduction methods.
k) Consider a probability-based binary classifier. Which of the following statement(s) is/are always true about the ROC curve, and the area under the ROC curve (AUC):
i. An AUC of 0.5 represents a classifier that performs worse than a random clas-sifier, on average.
ii. The ROC curve is generated by varying the discriminative threshold of the classifier.
iii. The ROC curve can be used to visualize the tradeoff between true positive and false positive classifications.
iv. The ROC curve increases monotonically.
1) Which of the following algorithms can learn nonlinear decision boundaries?
i. Quadratic discriminant analysis.
ii. Support vector machine with a Gaussian kernel.
iii. Logistic regression.
iv. Decision stump (a tree with at most 1 layer below the root).

Part C: refer to the printout of Workflow: Predicting Algae Blooms, pp. 8-11.

Question 4. For this question, the focus is on your ability to interpret the various outputs of a machine learning workflow.
a) Prediction Models
i. Explain briefly why the linear model is not a great fit for a2.
ii. What variables are retained in the final learn model for a2?
iii. Give the decision rules (in the format IF ... THEN ... ) provided by the pruned regression tree for a2.
iv. What is the relative importance of the variables in that pruned tree?
b) Model Evaluation
i. Why do we use cross-validation in this problem?
ii. Briefly describe replicated k-fold cross-validation.
iii. NMSE is used to evaluate the various model performances. What is the range of good NMSE values? What is the range of bad NMSE values?
iv. According to the Bonferonni-Dunn CD diagram, what are the 5 best predictive algorithms for this task and dataset?
c) Model Prediction
i. For which target variable(s) does the model provide the best predictions? Jus¬tify your answer.
ii. For which target variable(s) does the model provide the worst predictions? Justify your answer.
iii. Why are there vertical lines of predictions in some of the scatterplots?
iv. For the variable(s) identified in part i., does the model make good predictions according to the problem description?

Question 5. How could one attempt to improve on the results of the workflow? Provide 6 suggestions using course concepts, with justification.

Attachment:- Machine learning assignment.rar

Reference no: EM132489263

Questions Cloud

Discuss alans responsibility to bob : Alan is a trained and licensed plumber. he carelessly installed a steam heater in bobs restaurant, and, as a result, bob was seriously burned

Calculate goodwill and non-controlling interest : Prepare the amortization schedule of the acquisition differential, and calculate the consolidated net income and NCI for Year 5.

What would be the monthly payment : You see that there is a 15-year loan available with an annual interest rate of 3.0%. What would be the monthly payment on this loan

How much in dividends did heaton pay : The company had reported $555,000 of retained earnings. No shares were repurchased during 2020. How much in dividends did Heaton pay during 2020?

Parametric and a non-parametric statistical learning : How could one attempt to improve on the results of the workflow? Provide 6 suggestions using course concepts, with justification

Possibility of starting a business : Have you started a business or explored the possibility of starting a business? If so, discuss how the various forms of legal entity factored into that process

Prepare amortization table using effective interest method : Prepare an amortization table using the effective interest method of amortization that covers the first 3 semi-annual payments on this bond issue

What was the return on invested capital : What was its return on invested capital (ROIC)?Tibbs Inc. had the following data for the most recent year: Net income = $300; Net operating profit after taxes

Restrictive covenants in contracts of employment : Why are courts reluctant to enforce restrictive covenants in contracts of employment? Under what circumstances would they be enforceable?

User Account

All Pages