Fit a logistic regression model to the data

Assignment Help Applied Statistics
Reference no: EM132367801

Assignment -

Answer all questions specified on the problem and include a discussion on how your results answered/addressed the question.

Submit your .rmd file with the knitted PDF (or knitted Word Document saved as a PDF). If you are having trouble with .rmd, let us know and we will help you, but both the .rmd and the PDF are required. This file can be used as a skeleton document for your code/write up. Please follow the instructions found under Content for Formatting and Guidelines. No code should be in your PDF write-up unless stated otherwise.

For any question asking for plots/graphs, please do as the question asks as well as do the same but using the respective commands in the GGPLOT2 library. (So if the question asks for one plot, your results should have two plots. One produced using the given R-function and one produced from the GGPLOT2 equivalent). This doesn't apply to questions that don't specifically ask for a plot, however I still would encourage you to produce both.

You do not need to include the above statements.

Please do the following problems from the text book R Handbook and stated.

1. Collett (2003) argues that two outliers need to be removed from the plasma data. Try to identify those two unusual observations by means of a scatterplot.

2. (Multiple Regression) Continuing from the lecture on the hubble data from gamair library;

a) Fit a quadratic regression model, i.e.,a model of the form

Model 2: velocity = β1 × distance + β2 × distance2 + ε

b) Plot the fitted curve from Model 2 on the scatterplot of the data.

c) Add the simple linear regression fit (fitted in class) on this plot - use different color and line type to differentiate the two and add a legend to your plot.

d) Which model do you consider most sensible considering the nature of the data - looking at the plot?

e) Which model is better? - provide a statistic to support you claim.

Note: The quadratic model here is still regarded as a linear regression" model since the term-linear" relates to the parameters of the model and not to the powers of the explanatory variables.

3. The leuk data from package MASS shows the survival times from diagnosis of patients suffering from leukemia and the values of two explanatory variables, the white blood cell count (wbc) and the presence or absence of a morphological characteristic of the white blood cells (ag).

a) Define a binary outcome variable according to whether or not patients lived for at least 24 weeks after diagnosis. Call it surv24.

b) Fit a logistic regression model to the data with surv24 as response. It is advisable to transform the very large white blood counts to avoid regression coefficients very close to 0 (and odds ratio close to 1). You may use log transformation.

c) Construct some graphics useful in the interpretation of the final model you fit.

d) Fit a model with an interaction term between the two predictors. Which model fits the data better? Justify your answer.

4. Load the Default dataset from ISLR library. The dataset contains information on ten thousand customers. The aim here is to predict which customers will default on their credit card debt. It is a four-dimensional dataset with 10000 observations. The question of interest is to predict individuals who will default. We want to examine how each predictor variable is related to the response (default). Do the following on this dataset

a) Perform descriptive analysis on the dataset to have an insight. Use summaries and appropriate exploratory graphics to answer the question of interest.

b) Use R to build a logistic regression model.

c) Discuss your result. Which predictor variables were important? Are there interactions?

d) How good is your model? Assess the performance of the logistic regression classifier. What is the error rate?

5. Go through Section 7.3.1 of the Handbook. Run all the codes (additional exploration of data is allowed) and write your own version of explanation and interpretation.

Attachment:- Statistics Assignment Files.rar

Reference no: EM132367801

Questions Cloud

PM Code of Ethics and Professional Development Analysis : PM Code of Ethics and Professional Development Analysis- As professional project managers in today's ever-changing and chaotic environment.
Develop a vision and mission statement for the project team : Develop a vision and mission statement for the project team specific to the current project. HINT: It is highly recommended to follow the guidance offered.
How might a person acquire the given abilities : "Success as an expatriate employee" - What abilities make a candidate more likely to succeed in an assignment as an expatriate? Which of these abilities.
Explain the importance of hrm to any organization : In your own words, explain the importance of HRM to any organization then determine a HRM function that interest you as a future career.
Fit a logistic regression model to the data : STAT 601 Assignment - Answer all questions specified on the problem: Fit a logistic regression model to the data with surv24 as response
Federal court case that influences online gambling : identify and cite one federal court case that influences online gambling, and also briefly summarize
Discussion on forced ranking and technology : In this assignment, you will prepare a presentation in which you recommend a forced ranking performance evaluation system to the Director of Human Resources.
Why is threadless so successful : Why is Threadless so successful? What competitive advantages do they have over comparable design ?rms using traditional strategies for product design?
Evaluate the impact of a rise in taxation : Unit: EC 2 Economics Assignment Brief - Resubmission. Evaluate the impact of a rise in taxation and cuts in public spending on the UK trade deficit

Reviews

Write a Review

Applied Statistics Questions & Answers

  Develop a forecasting model for the fund price

CIS3315 Assignment Questions - Using linear trend analysis, develop a forecasting model for the fund price. What is the linear equation that best fits the data

  Utilizing the post-first feature

Utilizing the Post-First feature. You will not be able to see your classmates' posts until after you have made your own post. This is intentional. You must use your own work for answers to questions 1-5. If something happens that leads you to want to..

  Explain the background of your project

QM 2023-03 - Regression Project - The purpose of this project is to apply the concepts and tools of regression to explore the relationship between two or more quantitative variables of your team's choice.

  Calculate the test statistic based on the sample information

Use alpha = .05 to determine if the store brand prices is significantly less than the national brand prices

  What is the total cost to produce the disks for the year

What is the total cost to produce the disks for the year? All workers will be fully utilized each quarter, in other words, there is no under utilization.

  What is meant by a stationary time series

Economics 361 Assignment. What is meant by a stationary time series and why is it important? Graph each time series and comment on whether you think it is stationary. Determine whether each time series is stationary

  Find the sample code letter

Find the sample code letter, the sample size and the acceptability constant -  Determine whether the lot should be accepted.

  The average weekly earnings of the plumbers in a city

The average weekly earnings of the plumbers in a city is [Mu=$750] with a standard deviation of [s=$40].  Assume that we select a random sample of [n=64 plumbers], what is the probability that the sample mean [x] will be greater than $740 or greater ..

  What is the correlation

Create and paste in a scatterplot that compares Final Exam Score and Project Score. What is the correlation (r-value)? How would you describe the correlation

  Compute the evpi how could this information be used

Compute the EVPI. How could this information be used? Determine the range over which each alternative would be best in terms of the value of P (demand low).

  Acompute descriptive statistics for each stock and the

a.compute descriptive statistics for each stock and the sampp 500. comment on your results. which stocks are most

  Illustrate the law of large numbers

Does the coin flipping process you just completed illustrate the Law of Large Numbers? Why or why not?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd