Reference no: EM133047955
GV903 Advanced Research Methods - University of Essex
Instructions
Please create a new document with your answers. Use LATEX in conjunction with knitr via RStudio for document creation, and submit both a PDF file (the compiled document) and the source files necessary for compilation (e. g., with the extensions .Rnw, .bib etc). Up to seven points are given for proper use of technology and proper formatting, as part of the maximum of 100 attainable points.
The number of points you can obtain for each question or task is given in square brackets.
The points add up to 100 (including the points for use of technology as described above) and determine your final mark for this assignment. The final mark you earn for this assignment is given by the equation
m = t + ∑ni=1 pi,
where pi denotes the number of points you earn for question or task i, n is the number of tasks in this assignment, and t is the number of points you earn for use of technology and formatting. All answers are evaluated on three criteria: how clearly the results are presented; how correct the results are; and how elegant, computationally efficient, or sophisticated the solution is (where applicable). Good luck!
Background: The Guardian League Tables 2013-2022
School leavers face uncertainty over what university to choose. Many prospective students turn to university rankings to make an informed choice. The Guardian League Tables are one of the most popular university rankings in the UK. Despite their popularity in the general population, they are notorious in academic circles because they ignore the academic quality of the faculty in favour of student satisfaction, selectivity in admissions, and career prospects.
The University of Essex overall, for example, is ranked 64th in 2022 (85th in 2021; 66th in
2020; 31st in 2019; 48th in 2018), as expressed by its "Guardian score", an aggregate score of the other variables. The Department of Government is ranked 35th (25th in 2021; 34th in 2020; 12th in 2019; 15th in 2018). One of the other variables the Guardian score is comprised of, for example, is the average entry tariff, a score given by the Universities and Colleges Admissions Service (UCAS) reflecting the average selectivity of each university. In the Department of Government, for example, the current average entry tariff is 109 (106 in 2021; 111 in 2020; 111 in 2019; 130 in 2018), which reflects a decline in relative selectivity over the years in line with the overall university strategy of attracting more students and growing its learning community.
In this second assignment, you will examine the composition of the Guardian League Tables using regression analysis. The file guardian.csv contains the full results for all universities for each year from 2013 to 2022. Your goal is to explain the Guardian score (which in turn determines a university's relative rank) using the other variables shown in the ranking. You can ignore the variable "Continuation" in all tasks below because it was provided only for the most recent years. In principle, one should be able to reconstruct the Guardian score almost perfectly by combining those variables in a linear regression function. More specifically, please complete the following tasks.
1 Linear model of the Guardian score in 2022
Estimate a linear model using ordinary least squares in R to explain the Guardian score 2022 using only data from 2022. Find the model that best reconstructs/aproximates the Guardian score. Explain how and why you selected the specification you ended up with. Write down the equation that describes the empirical model you estimated, and show the equation as part of your answer.
2 Regression table
Use the texreg package to show a perfectly formatted regression table for the model you esti- mated. Write a sentence that includes a cross-reference to the table.
3 Interpretation
Choose three model terms and interpret the results for them. Also interpret the goodness of fit of the model.
4 Recompute the coefficient block manually
Redo the calculations for the coefficient block in the regression table (coefficients, standard errors, p values) by showing the necessary equations, discussing what values must be inserted where and why, and then performing the calculations manually in R (i. e., without the lm function or similar, using functions only from the base package).
5 Transformation and visualisation of average entry tariffs
UCAS changed the definition and calculation of the average entry tariff before 2018. Up to, and including, 2017, values were often between 200 and 600. Starting in 2018 and using the new definition, the values were closer to around 100 to 200. In some of the tasks that follow, you will estimate panel models. However, some of the models will require that the UCAS tariffs are comparable. You should hence put the values all on the same scale and make them comparable using a z transformation.
Standardise the values and make them comparable using a z transformation. Show the R
code and equation(s) for this. Use ggplot2 to plot the UCAS average entry tariffs against the Guardian score, in different colours for before 2018 and after 2017, ideally in a single plot with two facets: one for the original version in the dataset and one for the z-transformed version you created. Demonstrate using your plot that the z transformation had the desired effect.
Briefly discuss two alternative ways of modelling the UCAS average entry tariff variable in a panel model without the use of a transformation. That is, how else could you model this temporal heterogeneity in the tariff variable in a panel model in a sensible way?
6 Counterfactual prediction for Essex
Use the linear model you estimated above to predict the Guardian score for Essex if Essex increased its average entry tariff from 115 to the average entry tariff it had in 2018. How large is the improvement in the Guardian score resulting from this?
What if Essex suddenly became as selective as Oxford in 2022 and adopted its UCAS average entry tariff? What would be the expected Guardian score?
Show the equation(s) and R code, and briefly explain what you did and why.
7 Create lagged variables
In preparation for some of the steps below, add lagged variables to the dataset-for all variables used in the linear model above, lagged by one time period. Show the R code, and briefly describe how you solved this. Hint: There are different ways to do this, but the merge function can come in handy depending on how you choose to complete the task.
8 Year fixed effects
In this task, use only functions from the base and stats packages. I. e., you can use lm but not plm or similar.
Estimate four fixed effects panel models, spanning all years. The first model should be like the model above but with year dummies. The second model should additionally contain a lagged dependent variable. The third model should be like the first but use a year fixed effect without the inclusion of dummies. The fourth model should do the same but also include a lagged dependent variable, like the second model in this task but using a year fixed effect without dummies.
Explain what each model does and how it can be interpreted. What are the pros and cons of each model? Use terms from the literature and slides to describe the models. Include equations for the models in your explanation. Show the R code and present the results in a single well- formatted regression table using texreg.
9 Institution fixed effects
Estimate two models with institution fixed effects, one without and one with a lagged dependent variable. Again, explain the models in a similar way as in the previous task. Include a third model that resembles the institution fixed effects model with lagged DV, but estimate it using the plm package, and include the R code here and explain it briefly. Show the three institution fixed effects models in a single table and comment on how the results compare.
10 First-differences model
Estimate a first-difference model with fixed effects for institutions using only the base and stats packages in R. Show the estimation equation, and describe the equations and your reasoning. What does the model tell us, in comparison to the other models so far? Which model is overall the best choice for the panel dataset, and why?
11 Random effects assumptions
Discuss which assumption(s) must be met for a random effects model to be sensible. Test in R if the assumptions are met. Summarise your conclusions briefly.
12 Interaction effect
In Section 10, you estimated a first-difference model. Extend the model by including a linear time trend and a multiplicative interaction effect between time and the student-to-staff ratio. Choose the most sensible way to do this. Show a marginal effects plot for the interaction effect. Show the R code for the model and the plot. Interpret the interaction effect including the main effects substantively.
Attachment:- Advanced Research Methods.rar