Fit a simple linear regression model

Assignment Help Basic Statistics
Reference no: EM13921153

1. Use the "Female Bears Data." Data from n = 19 female bears of varying ages are used to develop an equation for estimating Y = female bear's weight from X = female bear's neck circumference.

a. Fit a simple linear regression model with Y = female bear's weight and X = female bear's neck circumference. Click the "Storage" button in the Minitab Regression Dialog and select each of the items in the left-hand list (i.e., Fits, Residuals, Standardized residuals, Deleted residuals, Leverages, Cook's distance, DFITS). Write down the estimated regression equation and the MSE for this model.

b. Which bear number has the highest leverage and what is that leverage? [Leverages are in the column labeled "HI1"]

c. Is the leverage in the previous part higher than the threshold 3(p/n)?

d. Use the estimated regression equation from part (a) to calculate the fitted value for bear #6. [You can check your answer with the one Minitab provides in the column labeled "FITS1".]

e. Use your answer from the previous part together with the actual weight of bear #6 to calculate the residual for this bear. [You can check your answer with the one Minitab provides in the column labeled "RESI1".]

f. What is the leverage for bear #6?

g. Use the residual from part (e), the MSE from part (a), and the leverage from part (f) to calculate the internally studentized residual for bear #6. [You can check your answer with the one Minitab provides in the column labeled "SRES1" - remember Minitab calls these "Standardized residuals."]

h. Delete bear #6 from the dataset as follows: select Data > Subset Worksheet, click "Specify which rows to exclude," click "Row numbers," and type "6" into the adjoining box. Then refit the simple linear regression model with Y = female bear's weight and X = female bear's neck circumference. Write down the estimated regression equation and the MSE for this model.

i. Use the residual from part (e), the MSE from part (h), and the leverage from part (f) to calculate the externally studentized residual for bear #6. [You can check your answer with the one Minitab provides in the column labeled "TRES1" in the original worksheet - remember Minitab calls these simply "Deleted residuals."]

j. Use the estimated regression equation from part (h) to calculate the predicted value for bear #6 (i.e., based on the model fit to the subset worksheet excluding bear #6). [Note: the answer won't make a whole lot of sense, but don't worry about this since we're simply going to use this predicted value for part (k).]

k. Use the fitted value from part (d), the predicted value from part (j), the MSE from part (h), and the leverage from part (f) to calculate the DFFITS for bear #6. [You can check your answer with the one Minitab provides in the column labeled "DFIT1" in the original worksheet.]

l. Is the absolute value of DFFITS in the previous part higher than the threshold given in the online notes, ?

m. Use the residual from part (e), the MSE from part (a), and the leverage from part (f) to calculate the Cook's distance for bear #6. [You can check your answer with the one Minitab provides in the column labeled "COOK1" in the original worksheet.]

n. Is the Cook's distance from the previous part higher than the upper threshold given in the notes, 1?

o. Briefly summarize your findings with respect to bear #6. You might want to consider graphical evidence too!

2. (27 points) Use the "College GPA Data." Data from n = 40 college students are used to develop an equation for estimating Y = grade point average (GPA) from X1 = verbal score on a college entrance exam (percentile) and X2 = math score on a college entrance exam (percentile).

a. Fit a "full quadratic" multiple linear regression model with Y, X1, X2, X12, X22, and X1 X2. [In Minitab: Select Y as the Response, X1 and X2 as the Continuous predictors, click "Model," select both X1 and X2 together in the Predictors box and click the Add buttons next to "Interactions through order 2" and "Terms through order 2."] Also click the "Storage" button in the Minitab Regression Dialog and select Deleted residuals, Leverages, and Cook's distance. Write down the estimated regression equation.

b. Which student has the largest absolute externally studentized residual and what is that externally studentized residual?

c. Is the externally studentized residual from the previous part greater in absolute value than 3? What do we call such points?

d. Which student has the highest leverage and what is that leverage?

e. Is the leverage from the previous part higher than the threshold 3(p/n)?

f. What is it about the student identified in part (d) that gives him/her such a high leverage? (Hint: compare this student's exam scores with other students' scores.)

g. Which student has the highest Cook's distance and what is that Cook's distance?

h. Is the Cook's distance from the previous part higher than the upper threshold given in the notes, 1?

i. Investigate whether removing any of the observations identified in the previous parts dramatically alters the model results.

3. (4+4+8+6+6=28 points) Use the "Brand Preference Data." Here, n = 16 observations are used to develop an equation for estimating Y = Degree of brand liking from X1 = Moisture content of the product and X2 = Sweetness of the product. The results were obtained from an experiment based on a completely randomized design (the data is coded).

a. Obtain the studentized deleted residuals and identify any outlying Y observations using the Bonferroni outlier test procedure with α = 0.10. State the decision rule and your conclusion. (In Minitab: Use "Storage" and check "Deleted residuals" under "Stat > Regression > Regression > Fit Regression Model ..." to get studentized deleted residuals).

b. Use the leverage values to explain if any of the observations outlying with regard to their X-values according to the rule of thumb 3(p/n)?
(In Minitab, use "Storage" and check "Leverages" under "Stat > Regression > Regression > Fit Regression Model ..." to get leverage values).

c. The Management wishes to estimate the mean degree of brand liking for moisture content X1 = 10 and sweetness X2 = 3. Construct a scatter plot of X2 against X1 and determine visually whether this prediction involves an extrapolation beyond the range of the data. Also, use equation (10.29) of the textbook to determine whether an extrapolation is involved. Do your conclusions from the two methods agree?

d. The largest absolute studentized deleted residual is for case 14 (see part (a)). Obtain the DFFlTS, and Cook's distance values for this case to assess the influence of this case. What do you conclude from each of the above values?

e. Calculate the average absolute percent difference in the fitted values with and without the case 14. What does this measure indicate about the influence of case 14?

Reference no: EM13921153

Questions Cloud

What is the biological basis of ptsd : What is the biological basis of PTSD? What occurs in the brain and nervous system that apparently gives rise to PTSD symptoms? How did the soldiers depicted in the video exhibit PTSD? What were their symptoms? How long did they last
Determine the lot size for company b in the scenario : Determine the lot size for Company B in the scenario found in the attachment that would minimize total annual cost by using the economic production lot size model, showing all of your work.
Supplier of home insulation materials : ProofSmart Inc., a supplier of home insulation materials, was burned down in a recent fire. From the remains of what used to be the accounting ledger, the following information was recovered:
What would be the nominal and effective cost of that credit : Lamar Lumber buys $8 million of materials (net of discounts) on terms of 3/5, net 50; and it currently pays after 5 days and takes discounts. Lamar plans to expand, which will require additional financing. What would be the effective cost of that cre..
Fit a simple linear regression model : 1. (45 points) Use the "Female Bears Data." Data from n = 19 female bears of varying ages are used to develop an equation for estimating Y = female bear's weight from X = female bear's neck circumference.
Describing demonstrative communication : Write a 700- to 1,050-word paper describing demonstrative communication, which includes nonverbal and unwritten communication and involves such things as facial expressions, tone of voice, and body language. Include the following elements in your ..
Determinants of interest rate for individual securities : Determinants of Interest Rate for Individual Securities A particular security's default risk premium is 3.50 percent. For all securities, the inflation risk premium is 2.25 percent and the real interest rate is 3.00 percent. The security's liquidity ..
Calculate a pearson correlation coefficient : Identify two variables for which you could calculate a Pearson correlation coefficient. complete solution correct answer key. Describe the variables and their scale of measurement.
Do you agree that stock options were free : Do you agree (1) that stock options were free, and (2) that this is the primary reason for paying executives in options? Explain.

Reviews

Write a Review

Basic Statistics Questions & Answers

  Procedures results in binominal distribution

Determine wheter the given procedures results in a binominal distribution. If it is not binomial, identify the requirements that are not satisfied.

  Compute the phi-coefficient to measure the strength of the

question in a study investigating freshman weight gain the researchers also looked at gender differences in weight

  There are 4000 mangoes in a shipment how many mangoes have

there are 4000 mangoes in a shipment. it is found that it has a mean weight of 15 ounces with a standard deviation of

  What is probability that candidate is over sixty and female

What is the probability that a candidate is over 60 and female? Given that the candidate is male, what is the probability he is less than 60? Given that the person is over 60, what is the probability the person is female?

  Similarities between f-ratio and t statistic

Describe the similarities between an F-ratio and a t statistic. The basic relationship between t statistic and F-ratios can be stated in an equation. What is that equation?

  Test for single proportionone kind of plant has only blue

test for single proportion.one kind of plant has only blue flowers and white flowers. according to a genetic model the

  Smallest standard deviation to the statistic

Assuming the population is large, which sample size will give the smallest standard deviation to the statistic and select the numbers for the first five to be interviewed.

  Determine which test statistic you will use the standard

baby talk magazine reported that the mean time a mother spends changing diapers a day is 15.1 minutes. a sample of 12

  Enough evidence to establish that average weight lawnmowers

The sample standard deviation is 10 lbs. We would like to decide if there is enough evidence to establish that the average weight for the population of lawnmowers is greater than 100 lbs.

  Probability of passengers showing up

Find the probability that there are enough seats for all the passengers who show up. How many passengers are expected to show up?

  A box is to be constructed so that its height is five

a box is to be constructed so that its height is five inches and its base is y inches by y inches where y is a random

  What is the likelihood of the given occurrence

The local newspaper, the Corry Press, suggests discrimination in an editorial. What is the likelihood of this occurrence?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd