Reference no: EM133018074
Please help me check some of my answers as well as unanswered questions.
1) After dropping the missing values (if any), what percentage of observations in the sample has Direction = "Up"?
2) Randomly split the dataset, so that the training dataset includes 60% of the original dataset.
3) Fit a logistic regression model with Direction as the response variable and the five lag variables plus Volume as the predictors.
4) Create predicted label for each week in the test data. Indicate the distribution of the predicted label.
5) Compute the confusion matrix and the accuracy score for the test data, i.e., the rest 40% of the full dataset.
6) Create a decision tree model with the same X and y variables as you did for the logistic model. Set max_depth to 3 in this tree model.
7) Print out the image of the tree models. Briefly describe the tree according to the image.
8) Print out the variable importance scores. Describe which variable is more important in this prediction.
9) What percentage of all observations is being correctly predicted in the test data set by the decision tree?
10) In the test data set, consider only those observations for which the actual value of the target variable equals 1, Up = 1. What percentage of these observations is being correctly predicted by the decision tree?