Reference no: EM132400226
Assignment - Regression: diagnostics
Lecture revision
1. What is a linear regression model of Y vs X?
2. What are the model's assumptions? How can you assess these assumptions with diagnostic plots?
Diagnostic plots for various models
A statistician fitted several regression models to different variable pairs. For each model, she ran diagnostic plots for the residuals. For each of six residual plots in this homework, say which, if any, of the assumptions are violated. Based on your answer, which models are good fits?
Simple linear regression
For each of the following regression problems:
1. Fit a linear regression model. Produce the diagnostic plots, and comment on which model assumptions are violated, if any.
2. Print the model summary, write down the equation that R fitted through the data, and interpret the coefficients in the context of the dataset.
3. Interpret the p-values for the coefficients of your model in the problem's context.
List of problems -
Note: we always write Y vs X, so science vs math means science is the y-variable and math is the x-variable
- science vs math from hsb2 in library(openintro)
- socst vs math from hsb2 in library(openintro)
- socst 2 vs math from hsb2 in library(openintro)
Regression in practice
Setup. Suppose that you work for a fancy restaurant that often buy abalone in huge quantities. You want to decide if it's better to buy abalone as a whole ("in shell"), shucked (meat only), or dried. Let's say that the dried weight equals the cooked weight, for simplicity. The market price for abalone is as follows:
- Whole: 70 USD per kilo
- Shucked: 700 USD per kilo
- Dried: 1200 USD per kilo
At these prices, your task is to decide what is most economical for your restaurant? The data file, abalone.csv, is on Canvas. For variable descriptions, see abalone-descrip.txt.
Tasks -
1. Build a model with reasonable fit to predict dried weight based on whole weight.
2. Build a model with reasonable fit to predict dried weight based on shucked weight.
3. From your models, make a recommendation for your restaurant on what is the most economical type of abalone to buy (cheapest price and yield the most dried weight).
Notes and hints.
For each of your chosen model, you MUST include diagnostic plots, R's model summary, and write down the model's equation. Point out any assumptions that could be violated by your model.
For better fits, consider simple transformations such as √y, log(y), log(x), or include terms such as x2 in your model.
This is a moderately large and complex dataset, so don't expect textbook-like goodness of fits. However, if some assumptions are clearly violated (eg: independence), you should try transformations to address them.
Attachment:- Assignment – Regression diagnostics.rar