Reference no: EM133023146
Assignment components
This is a group assignment.
• As whole group, the assignment will include two documents:
1) Written documentation (saved as Word or PDF) of reproducible notes (data cleaning notes, flowcharts, data dictionary), and answers to assignment questions, AND
2) SAS code (saved as .sas)
• As an individual, submit your assessment of peers' contribution to group work.
Learning outcomes assessed
LO3: Evaluate data quality
LO4: Understand data linkage strategies
LO5: Develop and implement data merging and cleaning rules
LO6: Generate syntax (code) required to produce analysis ready datasets
Part 1: Data cleaning, documentation and data dictionary update 65%
Explore the data and decide on the approach to clean GP and ED datasets. For this assignment, you should concentrate on within-dataset cleaning. You are not required to cross-check inconsistencies between the two datasets, so no need to merge them together for this assignment.
For Part 1, you are required to:
A. Present your work in cleaning GP data in the following forms:
o A written document explaining the process of GP data exploration, data cleaning, decisions made, and results of your analyses ;
o A flowchart to graphically present procedures taken for cleaning GP data; and
o SAS code showing analysis for GP data exploration, data cleaning and annotations .
B. Present your work in cleaning ED data in the following forms:
o A written document explaining the process of ED data exploration, data cleaning, decisions made, and results of your analyses (8 mark);
o A flowchart to graphically present procedures taken for cleaning ED data; and
o SAS code showing analysis for ED data exploration, data cleaning and annotations.
C. Create new variables in the GP data (see definitions in Table 1) :
o Create variable smoke_status_GP to indicate a person's smoking status. Describe/justify your decision in the written document (6 marks);
o Create variable risky_alcohol_GP to classify health risk alcohol consumption ;
o Calculate BMI score (variable BMI_GP) ;
o Create variable obese_GP to indicate whether a person is obese ; and
o Create variable highBP_GP to indicate whether a person has high blood pressure .
D. Update the data dictionaries. If you decide not to update a data dictionary, you should provide reasons for not updating the data dictionary .
o Present an updated data dictionary for GP dataset, based on results of data cleaning step A and C above.
o Present an updated data dictionary for ED dataset, based on results of data cleaning step B above.
Part 2: Research Question
The manager of the Medical Plus GP practice is planning for health care coordination within the practice and wants to know characteristics of their clients such as socio-demographic characteristics, lifestyle factors, health status and others. You will be helping the practice manager to analyse the GP dataset and report on patient characteristics. Both practice manager and you have agreed that the analysis is based on a cleaned GP dataset (i.e. the dataset you cleaned in Part 1) and the report (in Word or PDF) is reproducible.
For Part 2, you are required to analyse the dataset that you cleaned in Part 1 and:
A. In the report, present results of your analysis in tabular format.
Your table(s) should be presented in an academic format similar to what would be found in the results section of a published journal article. You can present more than one table.
B. In the report, provide written interpretation of the results.
Your written interpretation of results should be presented in academic writing styles.
C. Include SAS code that generates the results that you report above.
Part 3: Data linkage
No data analysis is required to answer Part 3 questions.
Medical Plus GP Manager wishes to link their GP dataset to the Registry of Births, Deaths and Marriages (RBDM) deaths and PBS data to examine medication compliance among their patients. As the data custodian, the Medical Plus GP has access to patient identification information (names, addresses, dates of birth and Medicare number). RBDM data custodian has access to the identification information (names, addresses, and dates of birth). PBS data custodian has access to the Medicare number only. A research institute will be contracted to analyse linked data.
For this assignment, we can assume that patients have given consent for data linkage and ethics approval has been granted. The linkage will be carried out by the Centre for Health Record Linkage (CHeReL).
A. What data linkage strategies will CHeReL use to link the GP dataset to RBDM deaths and PBS data? Justify your decision.
B. Draw a diagram depicting variable exchange information between data custodians (GP, PBS and RBDM data), CHeReL and analyst to depict information interchange for data linkage and analysis purpose. Justify this data exchange process.
Attachment:- Assignment component.rar