Reference no: EM132352299
Final Project Instructions -
Evaluate the set of maternal personal, demographic, medical history, and health care related variables as risk factors for low birthweight among infants born at the Baystate Medical Center in Springfield, MA. Your dependent variable is the continuous response variable bwt. You will use multivariable linear regression to identify the potential factors associated with birthweight in this sample of infants, paying particular attention to factors associated with lower birthweight. You will use the A-L dataset if your last name falls in the [A-L] range, and you will use the M-Z dataset if your last name falls in the [M-Z] range.
We expect that you will complete the project by yourself. In other words, you must work on your own on this project and you are not allowed to share your project, results or electronic files or documentation with others. You are expected to erase the dataset provided to you after the final grade has been posted for this project.
All analyses are to be completed using Stata. The dataset you are assigned is in a comma delimited text file (.csv) format. You will need to import this file into Stata using the import command: File > Import > Text data (delimited, *.csv, ...). The dataset may be different from the original data.
Your report need will have a total of 4 different sections: Introduction, Methods, Results, and Discussion. Keep the report to a maximum of 8 pages (double spaced)
Introduction: This section is expecting to answer, "What is the rationale for the scientific question asked?" The rationale needs to be based on a significant public health issue. Describe the relationships of interest and the purpose of the analysis.
Please conduct and a small literature search (1-3 references) to understand the scientific question asked in the project and provide a brief summary in this section. (this section should be brief and amount to a few paragraphs. Limit it to about 1 page double spaced.)
Methods: This section should describe what steps and statistical methods you did to analyze the data and how you applied them to solve the questions asked. You also need to provide a description of what statistical methods were used and the rationale or purpose for it. Please describe any statistical methods used for testing assumptions of the test if needed. If you created new variables for your analyses, you need to provide the rationale for creating the variable and describe the method you used to create the new variable. Add a sentence referencing the software, in this case, Stata, you used for all your analyses, just as you are expected to do for any peer review publication.
Results: The results section needs to mimic a peer review publication, so it needs to include the following elements:
- Identify the variables used in the comparison and create a summary table that describes your sample. These descriptive statistics are based on the original data, rather than any new variables you create for your analyses.
- Use the Table 1 Template in Appendix A to present your descriptive statistics.
- For each set of variables compute the appropriate test statistic to assess the simple association between each independent variables and low versus normal birthweight infants. Describe the statistics you used in the Methods section and report the P value in the table.
- Summarize you're your findings based on the initial descriptive statistics in a brief paragraph.
- You will be using multi-sample tests and/or multivariable models to address the primary question(s), so you need to provide analyses that confirm that the model assumptions are met.
- Present your initial exploratory analyses on the original data that you used to make a preliminary assessment on the presence of potential outliers and distributional characteristics relevant to the statistical model needed to address the primary hypotheses you are asked to evaluate.
- Describe how you dealt with violations of the assumptions (selecting an appropriate transformation if applicable)
- Fit the initial model with all the independent variables.
- Provide detailed analysis of model fit based on residuals. At a minimum, you need to include a quantile normal plot to check the distribution of the residuals, residual versus fitted plot, partial residual plots (component-plus-residuals plot) for each continuous predictor (linearity).
- Describe the remedial steps you took to address issues identified in your analysis of the residuals
- How you dealt with non-linearity.
- How you dealt with observations that appear as potential outliers
- Fit the final model based on the remedial steps you took to resolve issues identified in your analysis of model fit.
- Copy and Paste the regression results for your final model into your report and label it as Table 2.
- Specify how the independent variables that appear in your final model were selected.
- Summarize the key results from your final multivariable regression model in the text, and include a table with all the regression results in the body of the paper
Discussion: In this section you need to describe what the results mean in the context of the scientific question integrating all the questions asked for the project
The bulk of your report should be the methods and the results. The discussion, like the introduction should be kept brief. It is okay to turn in reports less than the maximum, as long as everything requested is included and adequately covered.
Direct any questions about the project to your instructor or the TAs assigned to your class.
Baystate Hospital Data Documentation.
The Baystate Hospital Study is a study designed to identify risk factors associated with giving birth to a low birthweight infant. At the time the study was conducted in 1986, low birthweight was defined as any newborn weighing less than 2500 grams. Your brief review of the literature may suggest other criteria based on current approaches to risk stratification for newborns. Since the original study had only 59 low birth infants in its sample, you may have difficulty applying newer risk definitions to the current data, but you are free to explore other definitions in logistic regression models and see how the results compare across the different definitions. Your primary analysis, however, should be based on a multivariable linear regression that uses birth weight in grams as the dependent variable.
Please notice that the variable "low" is a dichotomization of the response variable bwt and, therefore, you must not use it as explanatory variable in the regression model. But you will use it to fill out Table 1 in Appendix A.
Attachment:- Linear Regression Analysis Assignment Files.rar