Perform logistic regression and assess the error rate

Assignment Help Applied Statistics
Reference no: EM132371330

Assignment -

Answer all questions specified on the problem and include a discussion on how your results answered/addressed the question.

Submit your .rmd file with the knitted PDF (or knitted Word Document saved as a PDF). If you are having trouble with .rmd, let us know and we will help you, but both the .rmd and the PDF are required.

This file can be used as a skeleton document for your code/write up. Please follow the instructions found under Content for Formatting and Guidelines. No code should be in your PDF write-up unless stated otherwise.

For any question asking for plots/graphs, please do as the question asks as well as do the same but using the respective commands in the GGPLOT2 library. (So if the question asks for one plot, your results should have two plots. One produced using the given R-function and one produced from the GGPLOT2 equivalent).

This doesn't apply to questions that don't specifically ask for a plot, however I still would encourage you to produce both.

You do not need to include the above statements.

Please do the following problems from the text book R Handbook and stated.

1. Use the bladder-cancer data from the HSAUR3 library to answer the following questions

a) Construct graphical and numerical summaries that will show the relationship between tumor size and the number of recurrent tumors. Discuss your discovery. (Hint: mosaic plot may be a great way to assess this)

b) Build a Poisson regression that estimates the effect of size of tumor on the number of recurrent tumors. Discuss your results.

2. The following data is the number of new AIDS cases in Belgium between the years 1981-1993. Let t denote time

y = c(12, 14, 33, 50, 67, 74, 123, 141, 165, 204, 253, 246, 240)

t = 1:13

Do the following

a) Plot the relationship between AIDS cases against time. Comment on the plot

b) Fit a Poisson regression model log(µi) = β0 + β1ti. Comment on the model parameters and residuals (deviance) vs Fitted plot.

c) Now add a quadratic term in time (i.e., log(µi) = β0 + β1ti + β2t2i) and fit the model. Comment on the model parameters and assess the residual plots.

d) Compare the two models using AIC. Which model is better?

e) Use anova()-function to perform χ2 test for model selection. Did adding the quadratic term improve model?

3. Load the Default dataset from ISLR library. The dataset contains information on ten thousand customers. The aim here is to predict which customers will default on their credit card debt. It is a 4 dimensional dataset with 10000 observations. You had developed a logistic regression model. Now consider the following two models

  • Model1 → Default = Student + balance
  • Model2 → Default = Balance

For the two competing models do the following

a) With the whole data compare the two models (Use AIC and/or error rate).

b) Use validation set approach and choose the best model. Be aware that we have few people who defaulted in the data.

c) Use LOOCV approach and choose the best model.

d) Use 10-fold cross-validation approach and choose the best model.

Report validation misclassification (error) rate for both models in each of the three assessment methods. Discuss your results.

4. In the ISLR library load the Smarket dataset. This contains Daily percentage returns for the S&P 500 stock index between 2001 and 2005. There are 1250 observations and 9 variables. The variable of interest is Direction which is a factor with levels Down and Up indicating whether the market had a positive or negative return on a given day. Since the goal is to predict the direction of the stock market in the future, here it would make sense to use the data from years 2001 - 2004 as training and 2005 as validation. According to this, create a training set and testing set. Perform logistic regression and assess the error rate.

Attachment:- Assignment Files.rar

Reference no: EM132371330

Questions Cloud

What is the relationship between ethics and policy : What is the relationship between ethics and policy? If individual ethics do not align with organizational policy, what might be the end result?
How important achieving and maintaining homeostasis : Homeostasis has preiously been discussed and it should be obvious how important achieving and maintaining homeostasis is to our health.
Prompts reflecting on their own personal interest : PhD candidates should provide an authentic personal statements to each of the five following questions/prompts reflecting on their own personal interest.
Huntington disease-down syndrome-sickle cell anemia : Examples might include a patient with Duchesne's muscular dystrophy. Huntington's disease, Down's syndrome, sickle cell anemia,
Perform logistic regression and assess the error rate : STAT 601 Assignment - According to this, create a training set and testing set. Perform logistic regression and assess the error rate
Who is the main pioneer behind the therapy : Who is/are the main pioneer(s) behind this therapy and what led to their establishing it? Highlight the main principles of this therapy and describe.
Calculate and store tax and group number for all taxpayers : Calculate and store tax and group number for all taxpayers - calculates the tax and group number based on Table 1 for each taxpayer and then stores tax
How did demonstrations compare to gerald corey sessions : How did these demonstrations compare to Gerald Corey's sessions with Stan that you watched this week? What similarities and/or differences did you notice.
Knowledge and skills paper : Knowledge and Skills Paper. Reflection and Literature Review

Reviews

Write a Review

Applied Statistics Questions & Answers

  What is the impact of connection on the presence of chlorine

Omitted Variables Bias - Estimate the regression in equation (3). What is the impact of "connection" on the presence of chlorine

  Calculate the regression equation

Calculate the regression equation for this data

  Identify the research design you intend to use to answer

Quantitative Research Report Assignment Tasks - Identify the research design you intend to use to answer the problem/opportunity

  What is number of degrees of freedom for chisquare statistic

What is the number of degrees of freedom for this chi squared statistic? Report your answer as a whole number

  Creating visual displays of data

For this portion of the activity, you will export output you created while working in SPSS for Chapter 4 into a Word document. Please read the instructions below to ensure you are including the correct material in your document (This chapter ha..

  One -way analysis of variance anova

Practice Exercise 13: One -way Analysis of Variance (ANOVA), When is it appropriate to use a one-way ANOVA to analyze data? What letter is use the designate the test statistic used in the one-way ANOVA?

  What is the distribution of the statistic x

1.if your data is a random sample of size n from the distribution N(4, 7), then what is the distribution of the statistic X(not Z or T)? Z?T?

  What do the regression results indicate

What other factors besides price might be included in this equation and do you foresee any difficulty in obtaining these additional data or incorporating them in the regression analysis?

  Compute the summaries for the variable age in years

Plot the distributions of diastolic blood pressure (dbp) and systolic blood pressure (sbp) by sex and death. Use only one graph to do this.

  Find the rejection region and calculate the test statistic

Many Alpine ski centers base their projections of revenues and profits on the assumption that the average Alpine skier skis four times per year. Set the null and alternative hypotheses. Find the rejection region and calculate the test statistic. Wh..

  Describes the impact of simultaneous changes in objective

"Range of optimality" describes the impact of simultaneous changes in objective function values and right-hand-side values.

  Suppose that x and y have the joint pdf

Suppose that X and Y have the joint pdf f(x, y) = 8xy; 0

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd