Reference no: EM132221899
Project 1 -
Part I: Longitudinal data, sometimes referred to as panel data, track the same sample at different points in time. There are two common formats for longitudinal data, short and long format.
The short form is intuitive and good for presentation, but it's not suited for analysis, such as regression procedure. As we know, PROC REG or PROC GLM only takes in long format data, in which each variable should possesses one and only one column.
Dataset epilepsy.txt from Blackboard is recorded in the short form, where data from each time point have its own column given ID and Treatment.
1) Import the epilepsy data set.
2) Convert it into long format using DATA steps.
3) Since the baseline is an 8-week seizure count and the rest are 2-week counts, convert all seizure counts into weekly rate.
4) Create one table displaying average age and weekly seizure rate at baseline by treatment.
5) Create a scatter plot of age (x axis) vs weekly seizure rate at baseline (y axis) with colored dots based on treatment.
6) Run a regression model, PROC REG / PROC GLM, with Weekly rate as the response, and age, treatment and time as predictors*.
7) Create and display a data set containing the original and the predicted value for each patient
8) Calculate and display the mean square error (MSE).
*Due to the repeated measurement and the type of response, the proper model would be more complicated than basic linear regression but here we would ignore that since the purpose of this project is to practice.
Part II - We need use cross-validation method to test the predictive ability of our model, since it's not appropriate to use the data which model is built on to test the model.
For each patient i,
1) Modify the original data, by deleting his/her observation, so the model building process would not include this ith observation.
2) Build the model and output the estimated values, and save the predicted values of seizure count belonging to the ith patient.
3) Create a %macro to do (1) and (2), and use %do loop to repeat these steps for each patient.
Combine the results,
4) Create and display a data set containing the original and the predicted value for each patient.
5) Merge them with original response values.
6) Calculate and display the mean square error (MSE).
Please clean your final output by suppressing unnecessary output that are not asked. As usual, comment each statement you used.
Project 2 -
Part I (Redo Project 1):
1) Import the epilepsy data previously used in project 1.
2) Convert from short to long format in R.
3) Redo Part II of project 1 in R.
Part II (Plot):
Using the result obtained and ggplot2 package, to create a scatter plot of predicted values vs. original values, and have
1) ID numbers (1 to 59) as the markers.
2) The color of the markers depends on age.
3) Two panels based on treatment using facet_grid().
4) x and y variables labeled properly.
Attachment:- Assignment Files.rar