Reference no: EM132311721
Introduction to Econometrics Assignment Questions -
QUESTION 1 - Consider the following simple linear regression model: Yi = α + βXi + ui
a. Classify the elements of this model into the following groups: (i) errors; (ii) variables; (iii) parameters; (iv) random; (v) non-stochastic/non-random
b. Decide if the following statement is true or false and explain your reasons: The above regression model implies that changes in Xi causes changes in Yi. And variables Xi and Yi are treated symmetrically as in correlation analysis.
c. If ui ∼ N(0, σ2) what would be the distribution of each Yi?
QUESTION 2 - Use the data in the worksheet entitled "Question 2" in "assignment_data.xlsx" for this question. Data for 176 countries on two variables sourced from the World Bank's Development Indicators are provided. Birth rate indicates the number of live births occurring during 2011, per 1,000 populations. GDP per capita is the average gross domestic product per person in 2011, as measured in current US$.
a. Before conducting any empirical analysis, briefly discuss your expectations about the relationship between the two variables: Birth rate and GDP per capita. (HINT: you can try to use the microeconomics of fertility to anticipate the relationship)
b. Create a scatterplot of birth rate and GDP per capita. How would you describe the relationship between the two variables? Also assess the relationship by computing a correlation ratio and interpret your result.
c. Generate a new variable ln (GDP per capita). Create a scatterplot of birth rate and ln (GDP per capita) and compute the correlation ratio between them. Do these results influence the potential regression specification to be chosen?
d. Assess the impact of GDP per capita on birth rate by undertaking the following bivariate regression analyses and then interpret your results. (HINT: Remember to discuss both the economic and statistical significance of your results, and explain which model is preferred and why)
i) Yi = α + βXi + ui
ii) Yi = α + βln(Xi) + ui
Where Yi = Crude birth rate per 1000 population in 2011, and Xi = GDP per capita ($US) in 2011.
QUESTION 3 - Use the data in the worksheet entitled "Question 3" in "assignment_data.xlsx" for this question. The data contains the following information collected for 680 university students in the United States:
stndfnl = the standardized final exam score
atndrte = the percentage of lectures attended
fresh = 1 if in 1st year of university; and 0 otherwise
second = 1 if in 2nd year of university; and 0 otherwise
priGPA = prior cumulative GPA (grade point average)
ACT= State high school graduation achievement test score
a) To determine the effects of attending lectures on final exam performance, first estimate a model relating the standardized final exam score (stndfnl) to the percentage of lectures attended (atndrte). Include the binary variables fresh and second as explanatory variables. Interpret the estimated coefficient on atndrte and discuss its statistical significance.
b) As proxy variables for student ability, add to the regression priGPA and ACT. Now what is the effect of atndrte? Discuss why and how the effects differs from that in a).
c) To test for a nonlinear effect of atndrte, add its squared term to the regression equation in b). What do you conclude?
QUESTION 4 - Use the data in the worksheet entitled "Question 4" in "assignment_data.xlsx" for this question. The data contains the following information collected for 654 youths on the following variables:
fev = stands for forced expiratory volume, the volume of air (in liters) that can be forced out after taking a deep breath, an important measure of pulmonary function. The objective of this exercise is to find out the impact of age, height, weight and smoking habits on fev.
smoke = smoker coded as 1; non-smoker coded as 0
age = in years
ht = height in inches
sex = coded 1 for male and 0 for female
a) Develop a suitable regression model for the purpose of the exercise, i.e. find out the impact of age, height, gender and smoking habits on fev.
b) What is the expected effect of each explanatory variable on fev? Do the regression results support your expectation?
c) Which of the explanatory variables, or regressors, are individually statistically significant, say, at the 5% level? What are the estimated p values?
d) Would you reject the hypothesis that the slope coefficients of all the regressors are statistically significant? How would you interpret the R2 value?
e) Would you expect age and height to be correlated? If so, would you expect that your model suffers from multicollinearity?
f) Would you conclude from this example that smoking is bad for fev? Explain.
Attachment:- Assignment Files.rar