Reference no: EM132300081
Assignment Questions -
1. In equation 15.13 Wooldridge gives the asymptotic variance of the IV estimator of β^1 in the simple regression case: σ^2/(SSTxRx,z2). Then in equation 15.43 gives the asymptotic variance of the IV estimator of β^1 in the multiple regression case: σ2/((SST)^2(1-R^22)).
Obviously, one difference between the two equations is the presence of σ^2 versus σ2. Beyond, this difference however, is 15.13 a special case of 15.43? Or is the simple regression case different? That is, if we applied the equation 15.43 to the simple regression case, would we obtain the same or different standard error compared to equation 15.13? Explain precisely including showing exactly why 15.13 is or is not a special case of 15.43.
Hint: Note carefully the elements of these formulas as defined in the text and course notes, in particular, the definition of Rx,z2 and R^22.
2. Recall the key to Homework Problem 9, Question 4. In part D, we considered a model that had fixed effects for cities and different time trends for each city:
(2) ln(uclmsit) = ai + citime + β1ezit + uit
In the key, I showed that after first-differencing, this resulted in the following model:
Δln(uclmsit) = gr_rateit = ci + β1Δezit + ηy + Δuit
where ci represent fixed effects for cities, ηy is the appropriate set of year dummies, and gr_rateit is the growth rate of unemployment claims.
Now I ask a related question, but assuming a different model. Specifically, suppose that we had instead postulated a model explicitly for the growth rate of unemployment claims as the dependent variable:
gr_rateit = ci + β1ezit + ηy + uit
Again, ci represent fixed effects for cities, ηy is the appropriate set of year dummies, and gr_rateit is the growth rate of unemployment claims, but the variable ezit is not first-differenced.
Hint: To answer this question below, I suggest you first explicitly create this growth rate variable:
. gen gr_rate = D.lnuclms
(22 missing values generated)
Given this model, either the within transformation or the FD transformation could be applied to difference out the fixed effects for cities. But it is inappropriate to apply -cluster- to make standard errors robust to within group (within city) correlation of the error terms because the number of groups is small, just 22 cities. That we do not have a number of groups sufficiently large to apply -cluster- makes the choice of estimation methods, within transformation or FD transformation, especially important. Which transformation is preferable here? Explain.
3. Recall the example on the earnings and education of twins that I included in the course notes "IV Estimation and Measurement Error."
A. Although we applied IV estimation to address measurement error in self-reported education, we also estimated a model which incorporated fixed effects for pairs of twins to control for ability. Recall that wageit is the wage of twin t in family i, and educi2,2 is the education level of twin 2 as reported by twin 2, educi1,1 is the education level of twin 1 as reported by twin 1.
A student was interested in the example and asked for the data set I used in those notes. The student later came back with a question.
The student was correct that the difference is not due to machine imprecision. Answer the student's question.
B. The set up: To address measurement error, recall that in the course notes I applied the first-difference transformation and used the first difference of the variable educt_t, "the other twin's report of this twin's education", as the instrument for an individual's self-reported education. That is, in estimating the first differenced model
log(wagei2) - log(wagei1) = β1(educi2,2 - educi1,1) + (ui2 - ui1)
we used (educi2,1 - educi1,2) as an instrument for (educi2,2 - educi1,1). Further, we showed that panel data had the further advantage of allowing us to also obtain consistent estimates if we assumed a model for the measurement error that included a fixed effect αi allowing for both twins in family i to overstate or understate their education:
educi2,2 = educ*i2 + αi + vi2,2
educi2,1 = educ*i2 + αi + vi2,1
educi1,1 = educ*i1 + αi + vi1,1
educi1,2 = educ*i1 + αi + vi1,2
In the course notes we showed that
(i) the first-differenced equation, log(wagei2) - log(wagei1) = δ0 + β1(educi2,2 - educi1,1) + (ui2 - ui1), and the first-differenced instrument, (educi2,1 - educi1,2), have differenced away the fixed effect αi and that
(ii) the differenced instrument (educi2,1 - educi1,2) is uncorrelated with the resulting error term in the first-differenced model so long as the v terms are uncorrelated with each other and with u.
Now the question for you:
Now we make our measurement error model more sophisticated by adding a fixed effect for twin t in family i(γit, t =1, 2) which allows for twin t in family i to overstate or understate education for both self and twin. These two fixed effects would imply the following equations for reported education levels:
educi2,2 = educ*i2 + αi + γi,2 + vi2,2
educi2,1 = educ*i2 + αi + γi,1 + vi2,1
educi1,1 = educ*i1 + αi + γi,1 + vi1,1
educi1,2 = educ*i1 + αi + γi,2 + vi1,2
where the v terms are the error term in this measurement error model which we assume are uncorrelated with each other and with u.
As we did in the course notes, substitute ("plug-in") using these equations for the true education levels educ*it in the first-differenced equation containing educ*it: log(wagei2) - log(wagei1) = β1(educ*i2 - educ*i1) + (ui2 - ui1) (e.g., substitute educi2,2 - αi - γi,2 - vi2,2 for educ*i2). Demonstrate that the differenced instrument (educi2,1 - educi1,2) is now correlated with the resulting error term in the first-differenced model.
C. Finally, you should recognize that there is an alternate differenced regressor and alternate differenced instrument such that the alternate instrument is uncorrelated with the resulting error term in the first-differenced model. Explain, making sure you explicitly state the alternate regressor and the alternate instrument.
4. In January 2005, Italy introduced regulations banning smoking in all indoor public places, with the aim of limiting the adverse health effects of second-hand smoke. Our research question concerns the effect of the smoking ban on hospital admissions for acute coronary events (aces). Acute coronary events are a short-term outcome with rapid onset, and we will assume that the acute effects of both active and passive smoking disappear quickly after the exposure is removed; that is, we assume that the effect of the ban on aces is immediate (rather than occurring with a lag, such as for cancer outcomes).
We examine time-series evidence on aces in one region of Italy. The data set smoking_ban.dta contains monthly data for the following variables:
year
|
years 2002-2006
|
month
|
months denoted 1-12
|
time
|
time variable = 1 in January 2002, and so on
|
aces
|
number of hospital admissions for acute coronary events
|
ban
|
dummy variable =1 starting in January 2005 when the smoking ban began
|
stdpop
|
age standardised population
|
(The age-standardized population is the number of individuals at risk for an acute coronary event, adjusted for differences in the age of the population in the governmental region over time.)
We assume the following model to estimate the effect of the smoking ban on aces.
(1) ln(acest) = β0 + β1ln(stdpopt) + β2bant + β2timet + θt + ut
where θt is a set of monthly dummy variables.
A. Estimate the model using OLS. Give the precise interpretation for β^1 and β^2.
B. Test for AR(1) serial correlation. Explain the conclusion of your test.
C. Now estimate the same model using -newey- with number of lags equal to 12 ( -lag(12)-). Then, given these estimation results, test again for serial correlation. Did serial-correlation robust estimation work? Explain.
D. Now, I introduce a piece of new information. Our theory of risk for an ace implies that β1 should equal 1. Estimate a version of (1) that incorporates the restriction β1 = 1. Further, in accordance with your conclusions from the earlier parts of this equation, apply either (i) OLS, (ii) serial-correlation robust standard errors, or (iii) serial-correlation robust standard errors after first-differencing.
5. Education level is thought to affect women's fertility. In particular, higher levels of education increase market wages available and thus increase the opportunity cost of time away from work rearing children. Suppose that our research question involved how women's fertility is affected by their education level but also whether there has been structural change in this effect over time. The data set kids.dta contains extracts of data from seven independent cross-sectional surveys of women from 35 to 54 years of age, in even numbered calendar years from 1972 to 1984. Type -describe- to get these variable descriptions:
variable name
|
variable label
|
year
|
72 to 84, even
|
educ
|
years of schooling
|
age
|
in years
|
kids
|
# children ever born
|
black
|
= 1 if black
|
east
|
= 1 if lived in east at 16
|
northcen
|
= 1 if lived in nc at 16
|
west
|
= 1 if lived in west at 16
|
farm
|
= 1 if on farm at 16
|
othrural
|
= 1 if other rural at 16
|
town
|
= 1 if lived in town at 16
|
smcity
|
= 1 if in small city at 16
|
meduc
|
mother's education
|
feduc
|
father's education
|
A. Suppose our model is (1)
Kidsi = β0 + β1educi + β2agei + β3agei2 + β4blacki + β5easti + β6northcent + β7westi + β8farmi + β9othrurali + β10towni + β11smcityi + θt + ui
where the variables are as described above and θt represents dummy variables for years.
Estimate model (1) assuming that the zero conditional meal assumption holds but obtain standard errors robust to heteroskedasticity. Give the precise interpretation of β1.
B. Also of interest is whether their has been structural change over time in the effect of education on fertility. Estimate a model similar to (1) but modify it to allow for the effect of education on fertility to be different in each year. Formulate and test the null hypothesis of no structural change in the effect of education on fertility and explain the result of your test.
C. Now revert back to equation (1) but recognize that education is likely an endogenous regressor. Explain briefly why we should suspect endogeneity.
D. Note that we have variables representing the education level of each woman's mother and father. Estimate the reduced form equation and verify that the identification condition is met and that the instruments are not weak. Then estimate (1) using mother's education and father's education as an instrument for educi. Again, apply heteroskedasticity-robust estimation. Give the precise interpretation of β1. Is the change in estimate of β1 relative to the estimate in Part A consistent with your explanation of the nature of the endogeneity above?
E. In the model estimated in part D, is a test of overidentifying restrictions possible? If not explain why not. If so, apply the test and briefly explain the conclusion you draw from the test.
6. Daylight Savings Time (DST) is well described by the phrase "spring-forward, fall-back." Each year on the spring transition date, clocks are moved forward by one hour, from 2 am to 3 am. This alters the relationship between clock time and solar time by an hour, moving sunlight from the morning to the evening. But springing forward for DST likely also causes a decrease in the amount of sleep obtained, and this may have various deleterious affects, one of which is motor vehicle accidents.
The objective in this question is to estimate the effect of DST on the number of fatal accidents that occur in the United States. Our data set contains information on the total number of accidents per week involving one or more fatalities in the US from 2002 to 2011. The data set accidents_dst has the following variables, illustrated in the following list (see in attached file):
- year is obvious enough, again data include the years 2002-2011
- accidents is the count of the number of fatal accidents in the U.S. during the given week
- dst is a dummy variable =1 if the DST is in effect that week
- week is a variable representing weeks of the year constructed such that DST always begins at the beginning of week 24; i.e., in each year dst is always first equal to 1 in week 24.
- holiday is a variable representing the number of holidays that fall in a given week; we expect that weeks with a greater number of holidays will tend to have a higher number of accidents.
I ask you to use this data to estimate the effect of DST on fatal accidents, measured as ln(accidents), using the Regression Discontinuity Design model. Make certain that
- your model includes fixed effects for years
- incorporates the variable representing the number of holidays occurring in a week
- includes the appropriate quadratic terms allowing for a non-linear relationship between the number of accidents and the forcing variable week.
In addition to estimating the model, give the precise interpretation of your estimate of the treatment effect of DST.
Attachment:- Assignment & Data Files and Course Notes.rar