Reference no: EM132379573
Case Study Instructions -
Background - This data was simulated. The motivation for this assessment comes from Dr Leah Shepard's (USYD) article. It is not necessary to read this for the assignment or course, and the link is provided purely for your own interest and if you wish to know more about this topic.
Subject matter background - Prostate-specific antigen (PSA) is a blood marker for prostate cancer and men with elevated levels of PSA (relative to their age) may be sent for further cancer testing and monitoring. However, there is evidence that HIV+ men may have lower PSA levels compared to their HIV-negative counterparts, which could lead to underdiagnosis in this group.
You are approached by a colleague who is interested in quantifying whether HIV+ men have lower PSA levels compared to HIV-negative men. They are concerned because PSA level is known to be higher in older people and the HIV-positive population tends to have a younger age distribution than the HIV-negative population.
They provide you with a dataset on 296 men aged 18 and over attending participating health clinics. Serum samples were collected and analysed for total PSA and testosterone level. Information on several other variables that are possibly associated with PSA level were also collected. The variables in your data are:
- id: participant identification number
- psa: measured serum total PSA (ng/mL). This is the outcome variable for this assessment.
- hiv: HIV-positive status (0: HIV-negative, 1: HIV-positive). This is the exposure variable for this assessment
- age: at date of serum sample collection
- test: measured serum total testosterone (ng/dL)
- ethnicity: (0: Caucasian, 1: Other)
- wc: waist circumference (cm)
- pros_vol: prostate volume (ml)
Exercise -
Your task is to conduct a regression analysis to assess whether HIV+ men have lower PSA levels after adjusting for differences in age and other possible confounders. This regression model should have psa as the outcome variable and hiv as the exposure variable. It should also include adjustment of potential or actual confounders as you see appropriate.
To this end, follow the model building steps below (in this order):
1. Investigate the individual associations between each variable and psa: identify which variables should potentially be included in a multivariable model; identify if any transformations are necessary; and identify any possible issues such as non-linearity or collinearity.
2. Create an initial multivariable regression model with psa as the outcome, hiv as the exposure and including all possible confounders identified in step 1.
3. Investigate possible collinearity in this model and deal with it appropriately (if needed).
4. Identify the most suitable multivariable regression model for this research question excluding any further variables as you see fit (or excluding none at all).
5. Check the assumptions of this model and make any adjustments as necessary.
Written conclusion: (no longer than 1 page long)
In addition to the oral presentation, you must write a standalone summary of your findings for a clinical collaborator that includes the following:
1. A description and explanation of any issues that arose during the model building process and why you excluded any variables from the analysis (if any).
2. A specific answer to the research question by interpreting your final model including relevant P-values, regression coefficients and/or confidence intervals.
3. A summary of any other findings relevant to the research question.
4. An equation that describes your final model.
This summary should be targeted towards an audience (your hypothetical experimental/clinical collaborator) who is familiar with basic statistics (such as P-values and confidence intervals), but unfamiliar with the technical details of regression analysis. It should not include any Stata output or code.
Attachment:- Assignment Files.rar