Compute and plot the leverage of each point

Assignment Help Applied Statistics
Reference no: EM132263005

Method of Data Analysis Assignment -

Please use Rmarkdown to write your solutions and submit your solutions with relevant R code included as a pdf file.

Question - Census data was collected on the 50 states and Washington, D.C. We are interested in determining whether average lifespan (LIFE) is related to the ratio of males to females in percent (MALE), birth rate per 1,000 people (BIRTH), divorce rate per 1,000 people (DIVO), number of hospital beds per 100,000 people (BEDS), percentage of population 25 years or older having completed 16 years of school (EDUC) and per capita income (INCO). The data stored in the data file Census.txt can be found on the course website.

Answer the following questions.

Part 1: In this Part, compute by hand using matrix formulas. DO NOT USE lm() command in this part.

We consider a multiple linear regression model with LIFE (y) as the response variable, and MALE (x1), BIRTH (x2), DIVO (x3), BEDS (x4), EDUC (x5), and INCO (x6), as predictors. Answer the following questions using least square estimates in term of matrix formulas.

(a) Compute and report the least-squares estimates. Write down the least-squares regression equation.

(b) Explain in context what the coefficients corresponding to MALE and BIRTH mean.

(c) Compute the biased and the unbiased estimates of the error variance σ2.

(d) Using the unbiased estimate of error variance, Compute the standard errors of the estimators of the regression coefficients.

(e) Compute the coefficient of determination. Give a practical interpretation of your result.

Part 2: In this part, you may use all R commands you need, including lm() function, to answer the following questions.

(a) Fit the MLR model with LIFE (y) as the response variable, and MALE (x1), BIRTH (x2), DIVO (x3), BEDS (x4), EDUC (x5), and INCO (x6), as predictors.

(b) At level α = 5%, conduct the F-test for the overall fit of the regression. Comment on the results.

(c) At level α = 1%, test each of the individual regression coefficients. Do the results indicate that any of the explanatory variables should be removed from the model?

(d) Determine the regression model with the explanatory variable(s) identified in part (c) removed. Write down the estimated regression equation.

(e) Perform a partial F-test at level α = 1% to determine whether the variables associated with MALE and INCO can be removed from the model.

(f) Compute and report the F test statistic for comparing the two models

E(Yi|xi) = β0 + β1xi1,

E(Yi|xi) = β0 + β1xi1 + β2xi2 + β3xi3 + β4xi4 + β5xi5 + β6xi6,

(g) Perform a partial F-test at level α = 1% for comparing the two models

E(Yi|xi) = β0,

E(Yi|xi) = β0 + β1xi1 + β2xi2,

(h) Compute and report the terms in the decomposition

SSreg(β1, β2, β30) = SSreg(β30) + SSreg(β20, β3) + SSreg(β10, β3, β2)

(i) Suppose we are interested in fitting a regression model using LIFE as the response variable and some subset of the variables (MALE, BIRTH, DIVO, and INCO) as predictor.

(i.1) Perform variable selection by ?nding the subset model that minimizes the AIC criteria. State the 'best model'.

(i.2) Perform variable selection using forward selection. State the 'best model'.

(i.3) Perform variable selection using backward selection. State the 'best model'.

Part 3: In this part, you may use all R commands you need.

We consider the multiple linear regression with LIFE (y) as the response variable, and MALE, BIRTH, DIVO, BEDS, EDUC, and INCO, as predictors.

(a) Plot the standardized residuals against the fitted values. Are there any notable points. In particular look for points with large residuals or that may be influential.

(b) Compute and plot the leverage of each point. Identify any points that have a leverage larger than 0.5.

(c) Compute the Cook's distance for each point. Identify any points that have a Cook's distance larger than 1. Are these the same observations as those seen in part (b)?

(d) Plot the standardized residuals against the variable BEDS. Specifically mark the point corresponding to Washington, D.C. What can you say about this observation?

(e) Remove the observation corresponding to Washington, D.C. and refit the model. Are there any notable differences with the model fit in part (a)?

(f) Plot the standardized residuals against each of the 6 explanatory variables. Specifically mark the observation corresponding to UT. What is notable about this state?

(g) Remove the observation corresponding to UT and refit the model. Are there any notable differences with the model fit in part (a)? In particular, how does UT's exclusion impact the R2 value?

Textbook - Springer Texts in Statistics - A Modern Approach to Regression with R. Authors: Simon J. Sheather. ISBN: 978-0-387-09607-0.

Attachment:- Assignment Files.rar

Reference no: EM132263005

Questions Cloud

Should firms outsource their innovation : Should firms pursue a strategic policy intended to control a network of partners and suppliers around the world
Discuss the important of job satisfaction : Discuss The Important Of Job Satisfaction in Organization Behavior
Frito-lay to the next level of outstanding maintenance : What might be done to help take Frito-Lay to the next level of outstanding maintenance? Consider factors such as sophisticated software.
Sexual harassment in the workplace : What is the scale and scope of sexual harassment in the workplace?
Compute and plot the leverage of each point : STA302/1001H1S Method of Data Analysis Assignment, University of Toronto, Canada. Compute and plot the leverage of each point
An order report is anticipated capacity requirements : An order report is the anticipated capacity requirements calculated based on both released and planned orders of the MRP plan. ?
Business requirements are the detailed set of business : Business requirements are the detailed set of business requests that any new system must meet in order to be successful.
Describes the characteristics and roles as a counselor : Write a 1,200-1,500-word essay that describes the characteristics and roles you hope to embody as a counselor and the counselor dispositions that you want.
Making business decisions : Wikis are Web-based tools that make it easy for users to add, remove, and change online content.

Reviews

len2263005

3/21/2019 11:48:20 PM

Instructions - Please save all of part 1 as one pdf and all of part 2 as a separate pdf and another different pdf just for part 3. Instructions: This is individual assignment. It is worth 100 points. Please use Rmarkdown to write your solutions and submit your solutions with relevant R code included as a pdf file via Crowdmark.

Write a Review

Applied Statistics Questions & Answers

  Each produce the same type of widget

Your firm has 3 different lines that each produce the same type of widget

  Water specimens contain nitrates

Water specimens contain nitrates, a solution that is dropped into the water will cause the specimen to turn red 95% of the time. When used on water specimens without nitrates, the solution turns the water red 10% of the time. Past experience in the l..

  What is the value of the sum of squares for gender

V-303-TOL1 - Applied Statistics Assignment - What is the value of the sum of squares for gender and what does this number represent

  Example the difference between graphs

Example the difference between graphs

  Find the probability that the shipment is accepted

If 0 or 1 of the sampled parts are found to be defective, the shipment is accepted, otherwise the shipment is rejected. Find the probability that the shipment i

  What are some of the main uses of a regression analysis

What are some of the main uses of a regression analysis

  Describe anova approach for testing difference in sample

Perform a test of hypothesis to determine whether the variances of two populations are equal and describe the ANOVA approach for testing difference in sample means.

  Probability that six fishes bite during the two hours

What is the probability that six fishes bite during the first two hours - what is the probability that he fails to catch any fishes during the first two hours?

  An independent-measures study produces

An independent-measures study produces t(21)=3.00, p

  Records indicated the restaurant gross

Records indicated the restaurant gross

  Find an article in the newspaper illustrating frequencies

1. Find an article in the newspaper illustrating frequencies.  Include a short summary of the article. Indicate the name of the newspaper, date, and name of the article. 2. Find an article in the newspaper illustrating mean, median, and mode.  Includ..

  List the kinds of factors that create the learning effect

List the kinds of factors that create the learning effect. Explain how changes in a process, once it is under way, can cause scallops in a learning curve. Name some areas in which learning curves are useful.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd