Reference no: EM132266691
Assignment -
The purpose of this assignment is to demonstrate that you can analyze large-scale data, using the proper analysis methods given variable levels of measurement.
Select a Statistics Canada dataset (collected in 2000 or later) that addresses a topic or issue you are interested in. You are welcome to use the GSS or HES but do not examine relationships that we have looked at in the lab assignments or lab demonstrations. Look at the survey documentation to learn about the objectives of the survey, how the data were collected, and who the target population was.
From your dataset, select several variables to work with that you find interesting. One of your variables will be the dependent variable and all other variables will be the independent and control variables (levels of measurement for both have been discussed in class). Use a minimum of four independent and/or control variables. Develop a basic linear regression model. All variables should have at least 200 non-missing (i.e. valid) responses.
To begin, look at the distribution of each of the variables individually, using frequency distributions, descriptive statistics, and graphs. The strategies you use will depend on each variable's level of measurement.
For all inferential analysis, be sure to implement the appropriate weighting (i.e., relative weight variable that you create) and designate appropriate missing data. Then test the relationship between your variables. If necessary, recode your variables in order to present your results more clearly. We have talked about various methods in class (e.g. interval proximate, recoding, dummy variables). Determine whether your results support or refute your hypotheses and the social science literature. Think about how you might explain any unexpected findings.
Write up your analyses in 4-5 pages. The required components are:
1) Description of the Survey: Describe the survey, its objectives, the population, and when and how data were collected. (about 1/2 page)
2) Description of the Variables: For regression analyses, describe the independent and dependent variables, using descriptive statistics. There is no need to go into details surrounding control variables but provide a brief explanation for your choice of controls.
Identify the exact question that was used to collect the data for all your variables (or explain how Statistics Canada created the variable by combining responses to other questions). Identify which values you treated as 'missing' in your analysis. Specifically, describe the distribution of each of the main variables in words, supported by tables, charts and/or graphs. Your description might include reports of frequencies, measures of central tendency, measures of variation, kurtosis and/or skew depending on the level of measurement of each variable. Use at least two tables or graphs to illustrate the results in this section.
3) Description of Hypotheses: Describe how you think your main variables will be related to the DV. Develop a hypothesis about whether and how the variables will be related. Justify your hypotheses using the social science literature.
4) Hypothesis Testing: For regressions describe the results of your hypotheses tests. Report the coefficient of determination (Adjusted R squared) and what this means. For your independent variables, where statistically significant relationships exist, describe more fully the relationship as learned in class (i.e., its unique effect by looking at the slope). If there is no statistically significant relationship, state this. Compare your results to the social science literature you've identified and develop an explanation for the results you did/did not find. Use a table to illustrate the results in this section.
Note - The dataset is from Statistics Canada General Social Survey.