Compute the estimated probability of death

Assignment Help Applied Statistics

Reference no: EM132218730

Biostatistics Assignment -

Problem 1 - The data set radiation contains

dose the neutron dose (201, 220, 243, 260),
treatment the treatment (1 = Streptomycin, 0 = saline control),
dead if mouse died in 3-10 days (1 = died, 0 = lived)

Question 1.1 - Compute the estimated probability of death for the 8 different combinations of neutron dose and treatment. Plot the estimated probabilities as a function of the neutron dose, in blue for mice treated with Streptomycin and in red for mice treated with the saline control (one single plot with all the probabilities).

Question 1.2 - Perform the chi squared test for independence for the contingency tables of treatment and dead for different neutron doses. In R you can use the function table to build contingency tables, as an example the treatment-dead contingency table for dose= 201 is obtained with the following code: table(radiation[radiation$dose == 201, c(2, 3)]).

In particular perform chi squared test for the following null hypothesis and explain the results (we reject at α = 0.05?): For neutron dose equal to 201,220,243,260, dead is independent of treatment? (We reject at α = 0.05?)

Question 1.3 - To compare probability of death for different treatment under the four possible neutron doses we perform now one-sided Wald tests, using the (asymptotically normal) statistic: δ = ˆp1 - pˆ0 where ˆp1 is the estimated probability of death for mice treated with Streptomycin and ˆp0 for mice treated with saline solution, under the same neutron Dose. In particular test the following null hypothesis, report the p-values and comment if we reject the null hypothesis at α = 0.05.

For neutron dose equal to 201,220,243,260, mice treated with Streptomycin are less or equal probable to survive than mice treated with the saline control.

When we can say that Streptomycin treatment is effective in contrasting radiations effects?

Question 1.4 - Fit, using only the observations from mice treated with Streptomycin, the model logit(E(dead|dose)) = β0 + β1dose (log-regr 1) where dead|dose is a Bernoulli random variable. As in Question 1.1 plot the probability of death predicted by the model (log-regr 1) as a function of the neutron dose. Add, in the same plot, the probability estimated from the data (for mice treated with Streptomycin). Fit, using the data from mice treated with Streptomycin, the other two logistic regressions models, now with polynomial terms,

logit(E(dead|dose)) = β0 + β1dose + β2dose2 (log-regr 2)

logit(E(dead|dose)) = β0 + β1dosei + β2dose2 + β3dose3 (log-regr 3)

For each of the three models above show the value of the estimated coefficients.

Perform model selection using AIC to choose between the three logistic regression models. perform the likelihood-ratio test to select between log-regr 1 and log-regr 2.

Problem 2 - The data set body fat contains estimates of the percentage of body fat determined by underwater weighing and various body circumference measurements for 252 men. The goal is to find a regression model to estimate the percentage of body fat from the body measurements only

fat the percentage of body fat estimated from underwater body measurement
age the age in years
weight Weight in kg
height Height in m
neck Neck circumference in cm
chest Chest circumference in cm
abdomen Abdomen (2) circumference in cm
hip Hip circumference in cm
thigh Thigh circumference in cm
knee Knee circumference in cm
ankle Ankle circumference in cm
biceps Biceps extended circumference in cm
forearm Forearm circumference in cm
wrist Wrist circumference in cm

Question 2.1 - Fit a linear regression model for the variable fat (percentage of body fat), using all the other body measurements in the data set as predictor variables. Use the function summary to extract information's on the coefficients of the linear regression. Which features seem to be the most relevant to predict the percentage of body fat? Motivate the answer. Can we reject at α = 0.05 that the coefficient for knee (knee circumference) is equal to 0? Explain.

Question 2.2 - We want now to obtain simpler regression models for the percentage of fat (fat). Perform model selection using both forward and backward stepwise regression using BIC as score. You can use the built-in function step.

Question 2.3 - A common way to estimate percentage of body fat by body measurements is through the body mass index (BMI).

The BMI is defined as the weight (in kg) of an individual divided by the square of the body height (in m). Compute the body mass index for all the individuals in the data set.

Fit the following linear model to estimate the percentage of body fat (fat): E(fat|bmi, age) = β0 + β1bmi + β2age (BMI model) where bmi is the body mass index. Report the fitted coefficients.

Question 2.4 - Compute 95% percentile confidence intervals for the coefficients β0, β1 and β2 in the model of Question 2.3 using non-parametric bootstrap. Compare the obtained confidence intervals with the intervals obtained with the R built-in function confint.

Problem 3 - In this problem we will study the Gumbel distribution. The Gumbel distribution is a continuous distribution with density function (PDF) given by the following expression, (in attached file)

Question 3.1 - Implement in R the functions dgumbel (PDF), pgumbel (CDF), qgumbel (quantile function) and rgumbel (sampling). This functions should behave and have the signature similar to dnorm, pnorm, qnorm, rnorm and the other built-in functions for PDF, CDF, quantile functions end random number generation implemented in R. In particular

For dgumbel, the R function should work as follow: dgumbel(x, mu, b). Where x can be a single numerical value or a vector of numerical values. The function should return a vector with the values of the density of the Gumbel distribution in the points x. Optionally you can implement also the additional parameter gumbel(x, mu, b, log) where log can be TRUE or FALSE and behaves as in dnorm.
For pgumbel The R function should work as pgumbel(q, mu, b), where q is the vector of quantiles and mu,b are the parameters of the Gumbel distribution. Optionally you can implement the additional parameter and associate behaviour lower.tail and log.p (check pnorm behaviour).
qgumbel(p, mu, b) should return the corresponding vector of quantiles for each vector of probabilities p. Optionally add the argument lower.tail.
rgumbel(n, mu, b) should return a sample of size n distributed as independent Gumbel random variables with parameters µ=mu and β =b. Remember the inverse transform sampling.

In all functions, the parameters mu (the location parameter µ) and b (the scale parameter β) should have default values of mu = 0, b = 1. To test the above functions generate a sample of size 10000 from a Gumbel distribution using the function rgumbel implemented above. Plot the histogram and on top the true density dgumbel (you can use the curve function for this). The histogram should approximate well the true density. You can use the default values µ = 0 and β = 1. How can you check in R that the functions pgumbel and qgumbel are correct ? Which are the sanity checks that they should pass? Think also on some numerical checks to test if the implemented function pgumbel is the cumulative distribution function of dgumbel (hint: in R we can perform numerical integration). Comment and explain all the tests and checks you perform.

Question 3.2 - Using the formulas for the expected value and the variance of the Gumbel distribution given above, write the method of moments estimators for the parameters µ and β (you have to write down mathematical expressions not R code). Then apply the method of moments estimators to fit a Gumbel distribution to the observations in the data set wind containing the measurements of the maximum of wind speed in different days in the city of St Martin-En-Haut (France). Plot the histogram of the data in wind and the estimated Gumbel density corresponding to the method of moments estimators. Judge the estimation with a Q-Q plot.

Question 3.3 - Implement in R the minus log-likelihood for the Gumbel model and obtain numerically the maximum-likelihood estimation of the parameters µ and β for the wind data set. Use the method of moments parameters as initial parameters for the optimization algorithm, (if you did not solve Question 3.2 try different initial conditions e.g. µ = 0, β = 1). Plot the estimated density on top of the histogram and compare it with the method of moments estimates.

Question 3.4 - Fit also a Gaussian model to the data in wind using maximum likelihood. Check how well the Gaussian model fits the data using the histogram and a Q-Q plot. Do you think the Gaussian model is appropriate? Compare the Gaussian and the Gumbel models for the wind data using both AIC and BIC.

Can we use likelihood ratio test to compare Gaussian and Gumbel models? If yes compute the p-value otherwise comment the reason we can not perform the test.

Question 3.5 - Use non-parametric bootstrap to estimate the standard error of ˆµ and βˆ (the MLE estimators for the Gumbel distribution) over the wind dataset. Compute 95% confidence intervals for the parameters using normal quantiles. Compute also the percentile confidence intervals from the bootstrap samples.

Attachment:- Assignment File.rar

Reference no: EM132218730

Questions Cloud

Determine what research design was used : To narrow/broaden your search, remove the words qualitative and quantitative and include words that narrow or broaden your main topic.

How a phased implementation might allow the organization : Discuss how a phased implementation might allow the organization to focus on a critical area, stabilize the system, and make adjustments.

Discuss the use of epidemiology in disease prevention : Define epidemiology and describe the epidemiological triangle and other epidemiology tools in disease control and prevention. Discuss the use of epidemiology.

What is emotional intelligence : What is Emotional Intelligence - What is your Emotional Intelligence? Were you surprised - What is the key element that you learned about yourself

Compute the estimated probability of death : Biostatistics Assignment - Compute the estimated probability of death for the 8 different combinations of neutron dose and treatment

What nursing interventions will you include in plan of care : What nursing interventions will you include in the plan of care to address these concerns? What teaching strategies will you use to educate Millie and Fred.

Describe the situation that led to the conversation : Describe the situation that led to the conversation. When did you realize that there was a disagreement during the conversation?

Present an ethical problem confronted by starbucks : Start your initial post by identifying two characteristics of utilitarianism, two characteristics of virtue ethics, and two characteristics of deontology.

Write about topic related to health economics : Need to pick a topic in health economics and write a background, why we we should care section, literature review, discussion and conclusion

User Account

All Pages