Reference no: EM132374862
Econometrics Assignment -
A. Read Microeconometrics Using Stata (MUS) Chapter 3.
B. You may need to look at relevant Stata help files for various commands in the course of the assignment.
Questions -
Q1) Do exercises 1, 2 and 3 of MUS - all references to MUS are to the revised edition.
Exercises -
1. Fit the model in section 3.4 using only the first 100 observations. Compute standard errors in three ways: default, heteroskedastic, and cluster-robust where clustering is on the number of chronic problems. Use estimates to produce a table with three sets of coefficients and standard errors, and comment on any appreciable differences in the standard errors. Construct a similar table for three alternative sets of heteroskedasticity-robust standard errors, obtained by using the vce(robust). vce(hc2). and vce (hc3) options, and comment on any differences between the different estimates of the standard errors.
2. Fit the model in section 3.4 with robust standard errors reported. Test at 5% the joint significance of the demographic variables age, female, and income. Test the hypothesis that being male (rather than female) has the same impact on medical expenditures as aging 10 years. Fit the model under the constraint that βphylim = βactlim by first typing constraint 1 phylim = actlim and then by using cnsreg with the constraints(1) option.
3. Fit the model in section 3.5, and implement the RESET test manually by regressing y on x and y^2, y^3, and y^4 and jointly testing that the coefficients of y^2, y^3, and y^4 are zero. To get the same results as estat ovtest, do you need to use default or robust estimates of the VCE in this regression? Comment, Similarly, implement linktest by regressing y on y^ and y^2 and testing that the coefficient of y^2 is zero. To get the same results as linktest, do you need to use default or robust estimates of the VCE in this regression? Comment.
Q2) A newly graduated MA student is hired by a federal government department and assigned to a write a paper analysing average unemployment insurance (UI) benefit receipt by federal electoral riding. (Each Member of Parliament cares a lot about this report.) The student uses Statistics Canada survey data, which is flawless in that it has no measurement error or other related problems. (Or at least we will consider it flawless for the purposes of this question.) The sample is a cross-section of individuals and its size is massive.
The new hire considers this a simple task and runs an OLS regression with the dependent variable being an indicator for UI receipt (i.e. 1 if the person had received benefits in the last year, 0 otherwise). As "explanatory variables" the recently graduated student includes: age (in years), age squared, female (a 0/1 indicator), the industry in which the claimant worked prior to the claim (a set of 32 indicator variables representing 33 industries) and a set of 249 indicators for the 250 electoral ridings (he leaves district 153 - "Central Ottawa" as the omitted district). (Note: I'm not really sure how many ridings there are, or that there actually is a "Central Ottawa", but this is immaterial to the question.)
The new hire shows the director the coefficients and says: "See the coefficient for district 2. It is positive, but it is small in magnitude and not statistically significant. This implies that the people in that district, wherever it is, are approximately equally likely to claim UI benefits as those in the omitted group, which I selected to be Central Ottawa."
The director starts to laugh (not a good beginning to a new graduate's career) and she says: "You must have done something wrong. District 2 is Northern Newfoundland, which has a lot of fishers and fish plant workers with low levels of formal education and a high propensity to claim benefits. I can assure you that they use much more UI per capita than those in Central Ottawa, which is a wealthy area full of older, highly educated and highly paid civil servants who almost never get laid off. District 2 has a much higher take-up rate."
How can you explain the difference between the director's intuition about UI claim rates (which is correct) and the new graduate's conclusion based on the regression (which was run correctly - i.e. what is stated above was actually done)?
NOTE: This is a long question, but should only have a short (200 words maximum) answer.
Q3) Consider the following set of OLS regressions all run using the same observations. The two X variables are individual regressors with many observations. (They are not matrices comprising many regressors.) In all of the regressions the u terms are residuals (not "true" errors) and the coefficients and residuals have their usual definitions.
Y = a0 + a1X1 + a2X2 + u1
Y = b0 + b1X1 + u2
Y = c0 + c2X2 + u3
X2 = d0 + d1X1 + u4
X1 = e0 + e2X2 + u5
The following are only to rule out trivial answers. The variables X1 and X2 are not mean zero and they have a correlation of 0.25. (There is nothing special about this correlation except that it is not one and not zero. I could have instead made the more general statement 0<abs[corr(X1, X2)] <1.) All of the "a" coefficients (those in the first regression) are statistically significantly different from zero and from each other.
What you must do.
i. Write a simple simulation to create a specific "statistical world" that reflects the equation with the "a" coefficients. You will need to make several decisions in doing this, including: the sample size, the coefficient values, the variance of the error term, etc.
ii. Using the data you created for regression (a), run regressions (b) thru (e).
iii. For each regression (a) thru (e) create variables containing the relevant residuals; that is u1 thru u5. I recommend you use the "predict" command in Stata to do this. Whatever command you employ, you must store the residuals as the datatype called "double". For background on datatypes in Stata, type "help datatypes" at the Stata command line. Note that you must execute each "predict" command following the associated regression and prior to a subsequent regression being executed since Stata only stores results for the most recent regression in memory.
iv. Run all of the regressions (f) thru (s) and examine their values.
v. For each intercept and slope coefficient in the table below, write a short and clear (and neat and well organized) explanation of why it takes on the "value category" it does. I am only interested in 3 categories of values: equals 0 (zero), does not equal 0 (zero), or is equal to one of the above coefficients (i.e., a0 thru e2).
You must hand in your Stata *.log file (or *.smcl file) and the explanations for the value category of each coefficient. The latter portion may be legibly handwritten or typed - up to you. You may say "same as coefficient <letter> above" in discussing coefficients that have the same explanations as earlier ones.
Note that I am NOT asking you to undertake a Monte Carlo simulation. A Monte Carlo simulation is similar to what is being asked in that a "statistical model" will be created (usually with a small/modest sample size), random errors drawn, and the model estimated for the particular errors drawn. However, in a Monte Carlo simulation steps 2 and 3 (drawing random errors and estimating the model) will be repeated many times with each producing a different (set of) statistical parameter(s) -- e.g., coefficients and/or test statistics. The idea is to look at the distribution of the statistics generated across a large number of different sets of random error terms. In this case you only want you to draw one set of random errors and estimate one set of coefficients for each regression equation. We are focusing on the "mechanical" properties of OLS and many iterations are not required (though you may do many iterations and/or extensions to this simulation for your own interests if you wish).
Of course, a simulation only ever models specific contexts, but simulations can illustrate larger ideas. Though some are long and complex, simulations such as this, and full-blown Monte Carlos, can be quite easy/fast to do and provide a lot of insight into econometric problems you face in the future. They should be part of your "empirical toolbox" going forward.
You may benefit from using ideas, and copying/adapting lines of code, from the *.do files provided on Avenue. You need not "start from scratch".
Q4) Read the complete question before starting any part of it. This question should be answered using pencil and paper, not Stata. Consider the OLS regression results
Hours = 0.2 + 0.04Age - 0.002Age2 + 0.1School - 2.1Female - 0.003(Female*Age) + 0.0001(Female*Age2) - 0.3Public (1)
where the data comprise a large cross-sectional sample of the Canadian workforce, and the variables are defined such that:
Hours = weekly hours of work (takes values between 1 and 168 inclusive);
Age = age, measured in years;
School = highest level of education attained, measured in years;
Female = an indicator (sometimes called dummy) variable set to 1 if the respondent reports being female, 0 otherwise); and
Public = an indicator (sometimes called dummy) variable set to 1 if the respondent reports working in the public sector, 0 otherwise.
i) What is the marginal effect of being in the public sector on hours of work?
ii) What is the marginal effect of age on hours of work?
iii) Explain how (or whether) the concept of a "marginal effect" applies in each of (i) and (ii). That is, how should these marginal effects be interpreted?
In the next few questions I ask you to use matrix algebra. The approach I want you to use is so employ the regression specifications in equations (3), (4) and the like to motivate general solutions to each question posed. That is, use the matrix X to represent all the right hand side data, and b to represent the vector of right hand side coefficients, and take the same approach to other aspects of each regression. It is common in econometrics to need to shift from the specific to the general and vice versa.
You generate predictions from equation (1), and then run the OLS regression to estimate the set of "b" coefficients:
(Hours)^ = b0 + b1Age - b2Age2 + b3School - b4Female - b5(Female*Age) +b6(Female*Age2) - b7Public + u2. (2)
iv) Using matrix algebra, illustrate the relationship between the coefficients in equations (1) and (2).
You next obtain the residuals from equation (1), u1^, and then run the OLS regression:
u1^ = c0 + c1Age - c2Age2 + c3School - c4Female - c5(Female*Age) + c6(Female*Age2) - c7Public + u3. (3)
v) Use matrix algebra to derive the values of the coefficients in (3). Explain what the algebra means. Even if you do not manage to solve the algebraic problem, marks may be awarded for the explanation.
You next use the residuals you obtained earlier and run the OLS regression:
u1^ = d0 + d1Age - d2Age2 + u4 (4)
vi) Use matrix algebra to derive the values of the coefficients in (4). Explain what the algebra means. Even if you do not manage to solve the algebraic problem, marks may be awarded for the explanation.
Bonus question: vii) Use matrix algebra to determine the values of u2. Explain.
Attachment:- Econometrics Assignment File.rar