Reference no: EM132277705
Biostat Problems - Start Every Problem In One Page.
Problem 1 - Both the Binomial and Poisson distributions have been used to model the quantal nature of synaptic transmission. Briefly, the quantal hypothesis says that a nerve terminal contains a very large number of "quanta" and that each quantum has a small probability of releasing acetylcholine (ACh) in response to a nerve stimulus.
Suppose it is known that, for a given stimulus, the probability of Ach release is 0.01 for each quantum and is the same for all quanta.
You may assume the quantal responses are independent.
a. Using the Binomial distribution, what is the probability that in a nerve terminal containing 200 quanta zero Ach is released in response to stimulus?
b. Using the Poisson distribution, what is the probability that in a nerve terminal containing 200 quanta zero Ach is released in response to stimulus?
Problem 2 - Twelve ants and eighteen flies were placed in a container with insecticide and observed. After 16 of the thirty insects had died, there were nine ants alive and five flies alive. Apply the Fisher's exact test to test the hypothesis that ants and flies are equally susceptible to the insecticide. In developing your answer:
a. State the null and alternative hypotheses
b. What is the significance level, the p-value? (Tip - I encourage you to show your work)
c. In 1-2 sentences at most, interpret your findings.
Problem 3 - Sodium polystyrene (brand name Kayexelate) in sorbitol is a drug used to reduce serum potassium levels in hyperkalemic patients. Unfortunately, it is suspected of causing an adverse reaction leading to colonic necrosis. A study compared the incidence of colonic necrosis in 117 Kayexelate-exposed and 862 non-exposed post-surgical patients. Two cases of colonic necrosis occurred in the Kayexelate-exposed group. None occurred in the non-exposed group. Carry out the appropriate statistical hypothesis test to determine if there is statistically significant evidence of increased risk of colonic necrosis associated with Kayexelate exposure. In developing your answer:
a. State the null and alternative hypotheses.
b. What is the name of the correct test statistic. Note - I am not asking for its value here.
c. Calculate the value of the test statistic.
d. What is the significance level, the p-value?
e. In 1-2 sentences at most, what is your opinion regarding the risk of colonic necrosis in this population?
f. What is your favorite comfort food?
Problem 4 - Please note. The data for this problem are fictitious.
Consider the following case-control study investigation of the relationship of asbestos and lung cancer. An important covariate is smoking.
You are given the 2x2 table distribution of asbestos exposure (yes/no) and lung cancer (yes/no), overall and separately for strata defined by smoking (smokers and non-smokers)
Overall
|
Lung Cancer
|
|
Yes
|
No
|
Asbestos Exposure
|
Yes
|
80
|
38
|
No
|
15
|
152
|
Stratum = 1 (Smokers)
|
Lung Cancer
|
|
Yes
|
No
|
Asbestos Exposure
|
Yes
|
75
|
20
|
No
|
5
|
80
|
Stratum = 2 (Non-Smokers)
|
Lung Cancer
|
|
Yes
|
No
|
Asbestos Exposure
|
Yes
|
5
|
18
|
No
|
10
|
72
|
a. What are the values of
(i) The "overall" odds ratio?
(ii) The Mantel-Haenszel estimate of the "overall" odds ratio?
(iii) The stratum specific odds ratio for stratum =1 (Smokers).
(iv) The stratum-specific odds ratio for stratum=2 (Non-smokers).
b. Perform the appropriate statistical test of the null hypothesis of homogeneity of association. In developing your answer, please take care to indicate:
i) null and alternative hypotheses;
ii) name of test statistic and the value of the degrees of freedom;
iii) value of the test statistic; and
iv) pvalue. Please note - I ask you for the interpretation of your findings in the next question.
c. Using your answer to question b, in 1-2 sentences at most, in your opinion, is there statistically significant evidence that the relationship between asbestos exposure and lung cancer differ by smoking status?
d. Perform the Mantel-Haenszel test of the null hypothesis of no association. As before, in developing your answer, please take care to indicate:
i) null and alternative hypotheses;
ii) name of test statistic and the value of the degrees of freedom;
iii) value of the test statistic; and
iv) pvalue. Here too - I ask you for the interpretation of your findings in the next question.
e. Based upon your answer to question 4, in 2-3 sentences at most, what do you conclude?.
Problem 5 - In a logistic regression analysis of likelihood (π) of mortality that considered several variables, a one predictor model was fit to malnutrition (MALNUT) coded 1 = malnutrition, 0 = NO malnutrition. The following was obtained:
logitˆ[πˆ] = -1.8563 + 1.210[malnut]
The 2x2 table associated with these data is the following
|
Mortality
|
|
1 = Dead
|
0 = Alive
|
|
MALNUT
|
1 = Malnourished
|
11
|
21
|
32
|
0 = NOT malnourished
|
10
|
64
|
74
|
|
|
21
|
85
|
106
|
a. Verify that the regression coefficient (beta) for MALNUT in the fitted logistic regression model is equal to the natural logarithm of the odds ratio for MALNUT obtained from the numbers shown in the 2x2 table. Show all work.
b. Using the fitted logistic regression model, what is the formula for the estimated probability of death for a person who is malnourished?
c. Again using the fitted logistic regression model, what is the numerical value for the estimated probability of death for a person who is malnourished?
d. Now using the 2x2 table, what is the formula for the empirical estimate of the probability of death for a person who is malnourished?
e. Again using the 2x2 table, what is the numeric value for the empirical estimate of the probability of death for a person who is malnourished?
Problem 6 - A logistic regression model analysis was performed to investigate the relationship of sex, age, and income on event of clinical depression (1 = yes). The following results were obtained.
|
Β^
|
SE^(β^)
|
p-value
|
Sex (1 = Female)
|
0.925
|
0.393
|
0.02
|
Age (per year)
|
-0.024
|
0.009
|
0.01
|
Income (per $1,000)
|
-0.040
|
0.014
|
0.01
|
Constant (interept)
|
-0.477
|
0.867
|
0.19
|
Calculate the 95% confidence interval estimate of the relative odds (OR) of clinical depression for females versus males, adjusting for age and income.
Problem 7 - Consider again the fitted logistic regression model provided in question #6. Using this model, what is the estimated relative odds (odds ratio, OR) of clinical depression for a female aged 60 with income $50,000 compared to a reference person who is male aged 45 with income $75,000?
Problem 8 - The Scottish Heart Health Study (Smith et al, 1987) examined risk factors for coronary heart disease (CHD). This question pertains to a logistic regression analysis that explored the influences of six risk factors: age, total cholesterol, body mass index, systolic blood pressure, smoking, and physical activity. The following table details the variable definitions and their code definitions.
Variable
|
Label
|
Type/Code Definitions
|
CHD
|
Coronary Heart Disease
|
1 = yes, 0 = no
|
AGE
|
Age
|
Continuous, years
|
TOTALCHOL
|
Total Cholesterol
|
Continuous, mg/dL
|
BMI
|
Body mass index
|
Continuous, weight/height2
|
SBP
|
Systolic blood pressure
|
Mm Hg
|
SMOKING
|
Smoking status
|
1 = never, 2 = ex, 3 = current
|
ACTIVITY
|
Self-reported activity
|
1 = activity, 2 = average, 3 = inactive
|
For purposes of modeling, two design variables were created to represent the three responses of SMOKING and two design variables were created to represent the three responses of ACTIVITY. The total number of models that could be fit without interactions is, thus, 64. The investigators opted not to utilize one of the automatic variable selection procedures for selecting the "best model". Instead, they utilized a model building plan similar to one described in class, as this has the advantage of yielding more insights.
a. In one set of analyses, the author fit six separate one predictor models. The following table summarizes the values of the deviance statistic obtained. It also summarizes the assessment of the statistical significance of each crude association. In particular, there is a row for the "intercept only" model. In assessing the crude significance of each predictor, the one predictor model is compared to the "intercept only" model.
Model
|
Details of Model
|
Details of Likelihood Ratio Test
|
Deviance
|
df
|
Δ Deviance
|
p-value
|
"intercept only"
|
1569.37
|
4048
|
--
|
--
|
|
|
|
|
|
"intercept only" + AGE
|
1563.46
|
4047
|
5.91
|
0.015
|
"intercept only" + TOTALCHOL
|
1534.56
|
4047
|
34.81
|
< .001
|
"intercept only" + BMI
|
1560.43
|
4047
|
8.94
|
0.003
|
"intercept only" + SBP
|
1528.01
|
4047
|
41.36
|
< .0001
|
"intercept only" + SMOKING
|
1556.22
|
4046
|
13.15
|
0.0014
|
"intercept only" + ACTIVITY
|
1569.06
|
4046
|
-
|
0.86
|
Reproduce the calculations that yielded the p-value of 0.86 for the significance of the crude association of CHD with ACTIVITY. Show all work.
b. Based on the results of fitting several one predictor models, in the next set of analyses, ACTIVITY was dropped from consideration and the investigators considered an initial five predictor model containing AGE, TOTALCHOL, BMI, SBP, and SMOKING. In a second set of model fits, the investigators deleted predictors one at a time. The following summary was obtained for this second set of model fits:
Predictors in model in addition to the intercept:
|
Details of Model
|
Likelihood Ratio Test
|
Deviance
|
df
|
Δ Deviance
|
p-value
|
AGE, TOTALCHOL, BMI, SBP, SMOKING
|
1482.47
|
4040
|
---
|
---
|
|
|
|
|
|
---, TOTALCHOL, BMI, SBP, SMOKING
|
1484.00
|
4043
|
1.53
|
0.22
|
AGE, ---, BMI, SBP, SMOKING
|
1507.72
|
4043
|
25.25
|
< .0001
|
AGE, TOTALCHOL, ---, SBP, SMOKING
|
1486.09
|
4043
|
---
|
0.057
|
AGE, TOTALCHOL, BMI, ---, SMOKING
|
1509.03
|
4043
|
26.56
|
< .0001
|
AGE, TOTALCHOL, BMI, SBP, ---
|
1496.34
|
4044
|
13.87
|
0.001
|
Using this summary, carry out a test that produced the p-value of 0.057 for the row that reads AGE, TOTALCHOL, ---, SBP, SMOKING. In developing your answer, please provide:
(i) Null and alternative hypotheses.
(ii) Name of test statistic and the value of the degrees of freedom.
(iii) Value of the test statistic.
(iv) Value of the p-value.
(v) In 1-2 sentences at most, your interpretation of your findings.
c. Continuing, the investigators next considered a four-predictor model in which AGE was dropped. BMI was retained for the reasons that it was of borderline significance in the second set of analyses and highly significant in crude analysis. Then, a third set of model fits was done. In this set of model fits also, the investigators deleted predictors one at a time. The following summary was obtained for this third set of model fits:
Predictors in model in addition to the intercept:
|
Details of Model
|
Likelihood Ratio Test
|
Deviance
|
df
|
Δ Deviance
|
p-value
|
TOTALCHOL, BMI, SBP, SMOKING
|
1484.00
|
4043
|
---
|
---
|
|
|
|
|
|
---, BMI, SBP, SMOKING
|
1509.00
|
4044
|
25.00
|
< .0001
|
TOTALCHOL, ---, SBP, SMOKING
|
1484.48
|
4044
|
3.48
|
0.062
|
TOTALCHOL, BMI, ---, SMOKING
|
1515.37
|
4044
|
31.37
|
< .0001
|
TOTALCHOL, BMI, SBP, ---
|
1497.40
|
4045
|
13.40
|
0.001
|
In 1-3 sentences at most, based on the information provided in this table, which model would you choose to report and why?