Reference no: EM131474614
This assignment is a simulation study of OLS and 2SLS estimation of a causal equation when there is a possible omitted variable and the potential IV can vary in its relevance. The specification for the data generating process is as follows.
- A causal equation for the dependent variable Yi is
Yi = β0 + β1X1,i + β2X2,i + β3X3,i + σUi, (1)
in which the coefficient β1 is of primary interest as an object for statistical inference.
- The explanatory variable X1,i is generated by an equation of the form
X1,i = α0 + α1Z1,i + α2X2,i + α3X3,i + Vi (2)
scaled so that the standard deviation of X1,i is 1.
- The random variables (Z1,i, X2,i, X3,i, Vi)' are i.i.d. and have a multivariate standard normal distribution, i.e. N (0, I4).
- The disturbance Ui is independent of (Z1,i, X2,i, X3,i, Vi)t, with two distributions considered:
- Ui is i.i.d. standard normal
- Ui is i.i.d. log-normal (standardised to mean 0, variance 1.)
- Throughout the assignment the population values
β0 = 0, β1 = 1, β2 = 1, α0 = 0
will be set. The other population parameters will be given various values in the questions below.
The explanatory variable X3,i will be treated as an unobservable variable throughout the assign- ment. (eg X3,i can be thought of as something like ability in a wage equation.) Even though Yi will be generated from equation (1), which involves X3,i, calculations of estimators and confidence intervals will proceed without using X3,i. In this sense X3,i is playing the role of an omitted causal variable - it is (possibly) involved in the generation of values for Yi but is not included in the statistical analysis of the data.
1. To provide a baseline set of results, set
β3 = 0, α1 = 1, α2 = 0, α3 = 1.
For each combination of
n = 20, 200, 2000, and σ = 1, 2 and Ui ∼ Normal, Log Normal
(i.e. a total of 12 combinations), generate 1000 replications of samples from X1,i from (2) and Yi from (1). For each sample carry out an OLS regression of Yi on X1,i and X2,i (but omitting X3,i from the estimation) and obtain the OLS estimator of β1 and the corresponding 95% confidence interval for β1 (using OLS standard errors). Calculate and tabulate
- the bias of the OLS estimator for β1
- the standard deviation of the OLS estimator
- the coverage rate of the confidence intervals (i.e. the percentage of simulated confidence intervals that include β1)
- the average length of the confidence intervals (length is the upper limit minus lower limit of confidence interval).
Answer the following questions based on your results.
(a) Do your simulation results suggest that the OLS estimator is unbiased for β1? Explain why this should (or should not) be so using the parameter values specified for the data generating process.
(b) Explain how your simulation results suggest that the OLS estimator is consistent or inconsistent for β1.
(c) Discuss the simulated coverage rates for the confidence intervals and how they vary with n, σ and the distribution of Ui.
(d) Discuss the average confidence interval lengths and how they vary with n, σ and the distribution of Ui.
2. Repeat the simulations from question 1, except with the one difference that now β3 = 1, and report the same set of results as in question 1.
(a) Explain how and why there are both similarities and differences between the two sets of results.
(b) How do you think the results from question 2(a) would change if you re-ran the simu- lation with α3 = 0? It is not necessary to run this extra simulation, but you should be be able to describe qualitatively what should happen.
3. Repeat the simulations from question 2 (with α3 = 1), except now replace the OLS estimator with a 2SLS estimator in which Z1,i is used as an IV for X1,i while X2,i is treated as exogenous.
(a) Is Z1,i a valid IV in this case? Explain.
(b) Explain how the results from the simulations match with what you would expect in this case. How does the validity/invalidity of Z1,i as an IV influence the results? If there are unexpected results, identify these and attempt to explain what is going on1.
4. Set n = 200, σ = 1, Ui ~ Normal, β3 = 1, α3 = 1 and consider the range of values for α1:
α1 = 0.8, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.
For these 8 cases, carry out simulations of the same properties of the 2SLS estimator (bias and standard deviation) and confidence interval (coverage rate and length) as in the previous question. Explain the results in terms of the validity/invalidity of Z1,i as an IV for X1,i. Again identify any unexpected results and attempt to describe what is happening.
5. Specify the following values
β3 = 0, α1 = 1, α3 = 1, σ = 1, n = 200
and Ui ∼ Normal. Carry out 5 simulations of the OLS estimator and confidence intervals for the following values of α2:
α2 = 0, 2, 4, 6, 8.
Present and explain the results for the properties of the OLS estimator (bias and standard deviation) and its confidence interval (coverage rate and length).