Estimate a linear model using ordinary least squares

Assignment Help Applied Statistics
Reference no: EM133047955

GV903 Advanced Research Methods - University of Essex

Instructions

Please create a new document with your answers. Use LATEX in conjunction with knitr via RStudio for document creation, and submit both a PDF file (the compiled document) and the source files necessary for compilation (e. g., with the extensions .Rnw, .bib etc). Up to seven points are given for proper use of technology and proper formatting, as part of the maximum of 100 attainable points.

The number of points you can obtain for each question or task is given in square brackets.

The points add up to 100 (including the points for use of technology as described above) and determine your final mark for this assignment. The final mark you earn for this assignment is given by the equation

m = t + ∑ni=1 pi,

where pi denotes the number of points you earn for question or task i, n is the number of tasks in this assignment, and t is the number of points you earn for use of technology and formatting. All answers are evaluated on three criteria: how clearly the results are presented; how correct the results are; and how elegant, computationally efficient, or sophisticated the solution is (where applicable). Good luck!

Background: The Guardian League Tables 2013-2022

School leavers face uncertainty over what university to choose. Many prospective students turn to university rankings to make an informed choice. The Guardian League Tables are one of the most popular university rankings in the UK. Despite their popularity in the general population, they are notorious in academic circles because they ignore the academic quality of the faculty in favour of student satisfaction, selectivity in admissions, and career prospects.

The University of Essex overall, for example, is ranked 64th in 2022 (85th in 2021; 66th in
2020; 31st in 2019; 48th in 2018), as expressed by its "Guardian score", an aggregate score of the other variables. The Department of Government is ranked 35th (25th in 2021; 34th in 2020; 12th in 2019; 15th in 2018). One of the other variables the Guardian score is comprised of, for example, is the average entry tariff, a score given by the Universities and Colleges Admissions Service (UCAS) reflecting the average selectivity of each university. In the Department of Government, for example, the current average entry tariff is 109 (106 in 2021; 111 in 2020; 111 in 2019; 130 in 2018), which reflects a decline in relative selectivity over the years in line with the overall university strategy of attracting more students and growing its learning community.

In this second assignment, you will examine the composition of the Guardian League Tables using regression analysis. The file guardian.csv contains the full results for all universities for each year from 2013 to 2022. Your goal is to explain the Guardian score (which in turn determines a university's relative rank) using the other variables shown in the ranking. You can ignore the variable "Continuation" in all tasks below because it was provided only for the most recent years. In principle, one should be able to reconstruct the Guardian score almost perfectly by combining those variables in a linear regression function. More specifically, please complete the following tasks.

1 Linear model of the Guardian score in 2022

Estimate a linear model using ordinary least squares in R to explain the Guardian score 2022 using only data from 2022. Find the model that best reconstructs/aproximates the Guardian score. Explain how and why you selected the specification you ended up with. Write down the equation that describes the empirical model you estimated, and show the equation as part of your answer.

2 Regression table

Use the texreg package to show a perfectly formatted regression table for the model you esti- mated. Write a sentence that includes a cross-reference to the table.

3 Interpretation

Choose three model terms and interpret the results for them. Also interpret the goodness of fit of the model.

4 Recompute the coefficient block manually

Redo the calculations for the coefficient block in the regression table (coefficients, standard errors, p values) by showing the necessary equations, discussing what values must be inserted where and why, and then performing the calculations manually in R (i. e., without the lm function or similar, using functions only from the base package).

5 Transformation and visualisation of average entry tariffs

UCAS changed the definition and calculation of the average entry tariff before 2018. Up to, and including, 2017, values were often between 200 and 600. Starting in 2018 and using the new definition, the values were closer to around 100 to 200. In some of the tasks that follow, you will estimate panel models. However, some of the models will require that the UCAS tariffs are comparable. You should hence put the values all on the same scale and make them comparable using a z transformation.

Standardise the values and make them comparable using a z transformation. Show the R

code and equation(s) for this. Use ggplot2 to plot the UCAS average entry tariffs against the Guardian score, in different colours for before 2018 and after 2017, ideally in a single plot with two facets: one for the original version in the dataset and one for the z-transformed version you created. Demonstrate using your plot that the z transformation had the desired effect.

Briefly discuss two alternative ways of modelling the UCAS average entry tariff variable in a panel model without the use of a transformation. That is, how else could you model this temporal heterogeneity in the tariff variable in a panel model in a sensible way?

6 Counterfactual prediction for Essex

Use the linear model you estimated above to predict the Guardian score for Essex if Essex increased its average entry tariff from 115 to the average entry tariff it had in 2018. How large is the improvement in the Guardian score resulting from this?

What if Essex suddenly became as selective as Oxford in 2022 and adopted its UCAS average entry tariff? What would be the expected Guardian score?

Show the equation(s) and R code, and briefly explain what you did and why.

7 Create lagged variables

In preparation for some of the steps below, add lagged variables to the dataset-for all variables used in the linear model above, lagged by one time period. Show the R code, and briefly describe how you solved this. Hint: There are different ways to do this, but the merge function can come in handy depending on how you choose to complete the task.

8 Year fixed effects

In this task, use only functions from the base and stats packages. I. e., you can use lm but not plm or similar.

Estimate four fixed effects panel models, spanning all years. The first model should be like the model above but with year dummies. The second model should additionally contain a lagged dependent variable. The third model should be like the first but use a year fixed effect without the inclusion of dummies. The fourth model should do the same but also include a lagged dependent variable, like the second model in this task but using a year fixed effect without dummies.

Explain what each model does and how it can be interpreted. What are the pros and cons of each model? Use terms from the literature and slides to describe the models. Include equations for the models in your explanation. Show the R code and present the results in a single well- formatted regression table using texreg.

9 Institution fixed effects

Estimate two models with institution fixed effects, one without and one with a lagged dependent variable. Again, explain the models in a similar way as in the previous task. Include a third model that resembles the institution fixed effects model with lagged DV, but estimate it using the plm package, and include the R code here and explain it briefly. Show the three institution fixed effects models in a single table and comment on how the results compare.

10 First-differences model

Estimate a first-difference model with fixed effects for institutions using only the base and stats packages in R. Show the estimation equation, and describe the equations and your reasoning. What does the model tell us, in comparison to the other models so far? Which model is overall the best choice for the panel dataset, and why?

11 Random effects assumptions

Discuss which assumption(s) must be met for a random effects model to be sensible. Test in R if the assumptions are met. Summarise your conclusions briefly.

12 Interaction effect

In Section 10, you estimated a first-difference model. Extend the model by including a linear time trend and a multiplicative interaction effect between time and the student-to-staff ratio. Choose the most sensible way to do this. Show a marginal effects plot for the interaction effect. Show the R code for the model and the plot. Interpret the interaction effect including the main effects substantively.

Attachment:- Advanced Research Methods.rar

Reference no: EM133047955

Questions Cloud

Determine the carrying value of inventory at december : Determine the carrying value of inventory at December 31, 2021, assuming the lower of cost or market (LCM) rule is applied to individual products
Analysis of the companys key success factors : Analysis of the companys key success factors. Discuss challenges, obstacles, and failures and what the firm did to overcome them
Web search for methods employed by industry : Do a web search for other methods employed by industry or government to share information on possible incidents
Calculate the outstanding balance : Suppose you decide to purchase a car that has a case sale price of $60,000 through bank financing. Calculate the outstanding balance
Estimate a linear model using ordinary least squares : Estimate a linear model using ordinary least squares in R to explain the Guardian score 2022 using only data from 2022
Major advertising network : Go to a major online advertising firm such as Google's Google Marketing Platform or Xaxis, and explore products it is offering to marketers.
Describe a summary on cash flow management : Describe a summary on Cash Flow Management on it's importance to a company, how to purse financing, as well as collections in the company.
Determine the consolidated cost of sales : Parent Company purchased 80% of the outstanding shares of Subsidiary Company for P800,000. Determine the consolidated COST OF SALES. On January 1, 20x8
How cultural differences can affect communication : Diversity in all areas of the business and medical workplaces is a reality in US. Explain how cultural differences can affect communication among the healthcare

Reviews

len3047955

12/15/2021 10:39:45 PM

Hi Need the following Assignment to be derived mathematically and solve it using R code and to derive Latex file from it, this is the second Assignment already got my 1st assignment done please verify and say is that possible or not.

Write a Review

Applied Statistics Questions & Answers

  Managerial accounting-computing inventory balances

Calculate the product cost per unit and the total cost of the 2,100 units in ending inventory using absorption costing and variable costing. An explanation woul

  What would be the hypotheses you are testing

What would be the hypotheses you are testing in this situation? b.   Perform the test and give the correct conclusion based on your test results using α = 0.05.

  Explain the bias and sampling error

Explain the bias and sampling error

  Analyze the survey sample you have selected for your group

Analyze the survey sample you have selected for your group. The best way to do this will be to convene a meeting every week and discuss how you could apply analytical techniques covered in your classes so far on survey data.

  List the steps in the decision-making process

List the steps in the decision-making process. Explain the term bounded rationality. Explain the term suboptimization.

  Calculating the least squares regression equation

MAT10251 STATISTICAL ANALYSIS PROJECT. Calculating the least squares regression equation, correlation coefficient and coefficient of determination

  What is the critical path

What is the critical path? Which activity (activities) have the biggest slack? How many weeks of slack does the activity with the biggest slack have?

  The sample mean is contained in the interval

A delivery truck manager takes a sample of 25 delivery trucks and calculates the sample mean and sample standard deviation for the cost of operation. He asks an employee to construct a 95% confidence interval for the population mean. The employee fin..

  Based on least squared regression method find the projection

Based on least squared regression method find the projections for the years 2021, 2022, and 2023. Plot the number of sold cars versus years (points graph)

  Develop a statistical analysis based on case study scenarios

Develop a 700- to 1,050-word statistical analysis based on the Case Study Scenarios and SpeedX Payment Times. Use 0.10 as the significance level (a).

  Find probability that the mouse run a counter-clock circle

A Markov chain consists of a simple random walk taking place on a circle. The states consist of equally spaced points labelled 0, 1, 2, · · · , n in a clockwise direction. Find the probability that start at point 0, the mouse run a counter-clock c..

  Construct a frequency distribution and percents

Frequency distribution from part a: Find the mean, variance and stand excldeviation and using the frequency distribution from parta: What is the probability of a randomly selected herb to have a height of at least 17 inches?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd