Reference no: EM132367541
Assignment - STATA Questions
Please provide a full record of your software code and software output in an appendix.
The attached dataset is a dataset which contains survey responses from 2500 women aged over 70.
This dataset has been created in order to assess selected risk factors for depression. A summary of the dataset has been provided in Table 1.
Table 1: Depression dataset
Variable Name
|
Description
|
Key
|
studyno
|
Unique identifier
|
|
Age
|
Age in years at the time of survey completion
|
|
social_support_tertiles
|
Tertiles of the social support scale
|
1=In the lowest tertile of social support
2=In the middle tertile of social support
3=In the highest tertile of social support
e.g. those with social_support_tertile=3 are in the third who have the highest level of social support
|
depression
|
In the last 3 years have you been told by a doctor that you have Depression
|
0=No
1=Yes
|
Q1. Using a chi-squared test and a t-test, assess the association between age and depression, and social support and depression. Present your results in a table which would be suitable for inclusion in a scientific paper. Under the table describe and interpret these results.
[Note. For this question do not fit a statistical model, and look at each exposure variable one by one].
Q2a. Create a 'collapsed' dataset which records the number of depression records in each category of social support. [For this task you can temporally ignore the age variable. I am not assessing the procedure you used to create the data here, as long as the numbers are correct].
Using this grouped version of the data use software to run a logistic regression model which assesses the association between social support [as a categorical variable] and depression. Carefully interpret your results.
Q2b. Use software to run 'the same model' on the individual (non-collapsed) data. Present the software output and highlight that this model gives us the same estimates of association between social support and depression as the model in 2a.
Q3a. Using software fit a logistic regression model to assess whether there is an association between age and depression in this sample [including only age and depression]. Interpret the estimated age coefficient (and confidence interval and p-value).
Q3b. Use a Wald test to test whether the log(OR) associated with a 1-unit increase in age is greater than In(1.1).
Q3c. Using the model from part 3a, plot an appropriately labelled graph with age on the x-axis and the predicted log odds of depression on the y-axis.
Q3d. Detail how the value of the log-likelihood presented in your software output in 3a was calculated.
Q4a. Using software fit a single logistic regression model which assesses the association between the exposures social support [as a categorical variable] and age, and the outcome depression. Interpret the coefficients produced from this model.
Using software run a likelihood ratio test to assess the statistical significance of adding social support (as a categorical variable) to a more basic model which just includes the exposure age.
Q4b. What is the null and alternative hypothesis for this likelihood ratio test?
Q4c. How do you interpret the results of the likelihood ratio test?
Q4d. Using the model output from the relevant separate models (i.e. the log likelihood values) calculate the chi-squared statistic for this likelihood ratio test by hand.
Q5. Use statistical software and the Hosmer-Lemeshow method to assess how your model from Q4a (that includes age and social support) fits the data. Interpret the output produced. Briefly comment on possible limitations of the Hosmer-Lemeshow technique.
Q6a. Fit a logistic regression model with depression as the outcome, which includes age and social support as independent variables. This time include social support as a linear (trend) term as opposed to a categorical variable.
Interpret the results from your model. Explain whether you would you prefer to present the results of the model from Q6a or Q4a?
Explain why we would not use the Likelihood Ratio test compare the models form Q6 and Q4a.
Q6b. From this model in Q6a what is the predicted probability of depression for someone aged 75.25 and in the highest social support tertile?
Note - Attached the data file to be used to solve the above questions. The questions should be solved using STATA Software.
Attachment:- Data File.rar