Reference no: EM133761460
Question 1
If a test of hypothesis indicates that the correlation between two random variables is not significantly different from 0, does this necessarily imply that the variables are independent? Explain.
Question 2
The dataset lowbwt contains information for a sample of 100 low birth weight infants born in two teaching hospitals in Boston. Measurements of systolic blood pressure are in the variable sbp, and values of the Apgar score recorded five minutes after birth (an index of neonatal asphyxia or oxygen deprivation) are in the variable apgar5.
Construct a two-way scatter plot for these data.
Does Apgar score tend to increase or decrease as systolic blood pressure increases?
Estimate the correlation between systolic blood pressure and five-minute Apgar score for this population of low birth weight infants. What can you say about the strength of the relationship?
Test the hypothesis that the population correlation coefficient is zero (Ho: ρ = 0) against a two-sided alternative. Under the null hypothesis, the test statistic
Question 3
When asked to state the simple linear regression model, a student wrote it as follows:
Explain why this is incorrect, and write down the correct model. Clearly state all assumptions of this model.
The model is wrong. The expected value of y given x will have use the following equation
µ(y|x) = α + βx because by assumption the expected value of epsilon is zero. The simple linear regression model is y = α + βx + ε where α is the intercept and β is the slope of the linear relationship between the two variables. The assumptions of the model are that i. The error terms are normal random variables and are uncorrelated. ii. The x i are known constants, values of the predictor variable.
Question 4
Measurements of length and weight for a sample of 20 low birth weight infants are contained in the data set twenty. The length measurements are in the variable length, and the corresponding birth weights in weight.
Construct a two-way scatter plot of birth weight versus length for the 20 infants in the sample. Without doing any calculations, sketch your best guess for the least-squares regression line directly on the scatter plot (don't cheat and skip ahead to part b!).
Now, compute the true least-squares regression line. Draw this line on the scatter plot. Does the actual least-squares line concur with your guess?
The outlying point on the scatter plot corresponds to the 9th infant in the sample. Remove this point from the data set. Compute the new least-squares regression line based on the sample of size 19. Look at this fitted regression line. How does the least-squares line change? In particular, comment on the values of the slope and intercept.
Question 5
The dataset fev consists of 12 observations on forced expiratory volume (FEV) in liters, and the height in cm of boys aged 10-14 years. The response variable is fev and the covariate is height. Your goal is to use simple linear regression to understand the relationship between FEV and height.
Obtain and clearly state the estimate regression function. Plot the estimated regression function and the data. Does the linear regression function appear to give a good fit here? Discuss.
Obtain point estimates of the following: (1) the difference in mean FEV for two boys aged 10-14 whose height differs by one cm, (2) the mean FEV for boys with height 161 cm, (3) ε6, and (4) σ2.
Test whether or not there is a linear association between FEV and height. State the alternatives, decision rule, and conclusion. What is the p-value of the test?
Estimate β with a 99% confidence interval. Interpret your interval estimate. What does this result tell you about whether or not you would conclude there is a linear relationship between FEV and height at the 0.01 significance level?
Question 6
A person's muscle mass is expected to decrease with age. To explore this relationship in women, a nutritionist randomly selected 15 women from each 10-year age group, beginning with age 40 and ending with age 79. The results are included in the dataset MuscleMass. The X variable is age, and the Y variable is mmass, a measure of muscle mass.
Obtain the estimated regression function. Plot the estimated regression function and the data. Does a linear regression function appear to give a good fit here? Does your plot support the anticipation that muscle mass deteriorates with age?
Obtain a 95% confidence interval for the mean muscle mass for women of age 65. Interpret your confidence interval.
Obtain a 95% prediction interval for the muscle mass of a woman whose age is 65. Is the prediction relatively precise?
Get the residuals for this fitted model. You can get store the residuals in a variable named res after running the regression using:
. predict res, r
Plot a histogram of the residuals. What would you say about the normality assumption for this model fit?
Question 7
Statistics that summarize personal health care expenditures by state for the years 1966 through 1982 have been examined in an attempt to understand issues related to rising health care costs. Suppose you are interested in focusing on the relationship between expense per admission into a community hospital and average length of stay in the facility. The data set hospital.dta contains information for each state (and includes D.C.) for the year 1982. The measures of mean expense per admission are in the variable expadm; the corresponding average lengths of stay are in los.
Generate numerical summary statistics for the variable expense per admission and length of stay in the hospital. What are the means and medians of each variable? What are their minimum and maximum values?
Construct a two-way scatter plot of expense per admission versus the length of stay. What does the scatter plot suggest about the nature of the relationship between these variables?
Using expense per admission as the response and length of stay as the explanatory variable, compute the least-squares regression line. Interpret the estimated slope and intercept in words.
Construct a 95% confidence interval for the true slope of the population regression line. What does this interval tell you about the linear relationship between expense per admission and length of stay in the hospital?
PART B
Use linear regression to assess the relationship between BMI (valid information only. Hint, look at the codebook to see how BMI was coded before using the variable _BMI5) and age for adults age 18-79 and BMI and smoking (_RFSMOK3) in the 2021 BRFSS. Please use svy: commands and replace don't know values as missing for the smoking variable.
1. What is the predicted average BMI when age is 40?
2. For every one-year increase in age what is the average increase in BMI?
3.Carrying this model to the hypothetical extreme, what is the predicted average BMI for a newborn baby (age=0 years)?
4.What is the average BMI among non-smokers?
5. What is the average BMI among smokers?