Reference no: EM133370915
Question 1. The following problems will test your understanding of odds.
(a) On average, what fraction of people with an odds of 0.37 of defaulting on their credit card payment will in fact default?
(b) Suppose that an individual has a 16% chance of defaulting on her credit card payment. What are the odds that she will default?
Question 2. Suppose we collect data for a group of students in a statistics class with variables X1 =hours studied, X2 =undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated coefficient, ˆ β0 = -6, ˆ β1 = 0.05, ˆ β2 = 1.
(a) Estimate the probability that a student who studies for 30 hours and has an undergrad GPA of 3.5 gets an A in the class.
(b) How many hours would the student in part (a) need to study to have a 50% chance of getting an A in the class?
Question 3. Suppose that we wish to predict whether a given stock will issue a dividend this year ("Yes" or "No") based on X, last year's percent profit. We examine a large number of companies and discover that the mean value of X for companies that issued a dividend was ¯X = 10, while the mean for those that didn't was ¯X = 0. In addition, the variance of X for these two sets of companies was ˆσ2 = 36. Finally, 80% of companies issued dividends. Assuming that X follows a normal distribution, predict the probability that a company will issue a dividend this year given that its percentage profit was X = 4 last year. HINT: Use lecture notes covering Bayes' Theorem and Linear Discriminant Analysis.
Question 4. This question involves using R. This question should be answered using the dataset called Weekly, which is part of the ISLR2 package. This data is similar in nature to the Smarket data that we covered during lecture, except that it contains 1,089 weekly returns for 21 years,
from the beginning of 1990 to the end of 2010.
(a) Produce some numerical and graphical summaries of the Weekly data. For instance, you can use the pairs() function. Do there appear to be any patterns?
(b) Use the full data set to perform a logistic regression with Direction as the response and the five lag variables plus Volume as predictors. Use the summary() function to print the results. Do any of the predictors appear to be statistically significant? If so, which
ones?
(c) Compute the confusion matrix using the table() function and overall fraction of correct predictions. Precision and Recall are additional performance metrics for evaluating classification methods. They are defined as follows:
Precision =
True Positive
True Positive + False Positive
Recall =
True Positive
True Positive + False Negative