Reference no: EM133719142
Assignment: R Programming Worksheet
In an R Markdown document, write the appropriate code in an r chunk (use ```{r} to start a chunk and ``` to end it) and answer the questions in the normal white space on the document. R markdown will automatically print the plots when you knit the markdown document, so you don't need to save them anywhere else.
Required packages: tidyverse, jtools, kableExtra
Problem 1: Linear regression, model 1
1. Create a new object called employed by filtering (dplyr method) the data to include only people who are employed (in government, private company/organization, or self-employed).
2. Create a linear model that predicts income as a function of age, sex, and origin region for the dataframe created in I1.
3. Use summ() (from the jtools package). What is the reference category for this model?
4. What do the results of summ() tell you?
5. Use plot_summs() to print a plot of the variables in your model.
Problem 2: Linear regression, model 2
1. Create a linear model that predicts income as a function of age, sex, and language at home for employed people in the data.
2. Use summ(). What is the reference category for this model?
3. What do the results of summ() tell you?
4. Use plot_summs() to print a plot of the variables in your model.
Problem 3: Linear regression, model 3
1. Create a linear model that predicts income as a function of age, sex, language at home, and education for employed people in the data.
2. Use summ(). What is the reference category for this model?
3. What do the results of summ() tell you?
4. Use plot_summs() to print a plot of the variables in your model.
5. Instead of the education variable, run the same model with the binary "completed_college" variable.
Problem 4: Analysis
1. Look back at your results from the linear models. Look back at the expectations you wrote down in Homework. Which of your hypotheses/expectations seemed accurate? Did you find any surprises? Based on these linear models, would you say that differences in income among these groups are mainly based on age differences among the groups, on differences in education levels, or on differences in each groups' migration background and/or unique culture? Explain. Use specific examples of what the estimates (Est.) from these different models mean (focus on estimates with p values <= 0.05).
Problem 5: Logistic regression, model 1
1. Use the dplyr method (pipeline) to calculate what percentage of males and females in the data are self-employed
2. Since males have a much higher percentage of self-employment, create a new data frame by filtering to include only males. This might help isolate whether differences in culture, age, or education are better predictors of self-employment.
3. Set the reference categories for the language_at_home, origin_region, and education variables in your new data frame. I suggest setting them for the groups who you expect to have the lowest levels of self-employment.
4. Create a logistic regression model using glm(family = "binomial") to predict self-employment as a function of age and origin region.
5. Use summ() with exp = TRUE. What is the reference category for your model?
6. What do the results of summ() mean?
7. Use plot_summs() to print a plot of the variables in your model.
Problem 6: Logistic regression, model 2
1. Create a logistic regression model to predict self-employment as a function of age, language at home, and education.
2. Use summ() with exp = TRUE. What is the reference category for your model?
3. What do the results of summ() mean?
4. Use plot_summs() to print a plot of the variables in your model.
Problem 7: Analysis
1. Look back at your results from the logistic/binomial models in Problems V and VI. Look back at your expectations about what predictor variables would affect self-employment from Homework. Which of your hypotheses/expectations seemed accurate? Did you find any surprises? Based on these logistic models, would you say that differences in self-employment among males from these groups are mainly based on age differences among the groups, on differences in education levels, or on differences in each groups' migration background and/or unique culture? Explain what the models tell you about the relative effects of the predictor variables on self-employment. Pay particular attention to the odds ratios (exp(Est.)) with p values < = 0.05.