Reference no: EM132579041
Data description
We consider a dataset from 1994 relating hourly wage to education level, experience, gender, and physical attractiveness. More precisely, for each person, the following variables have been collected:
• Wage - hourly wage (in dollars);
• Educ - years of schooling (in years);
• Exper - working experience (in years);
• Female - Dummy variable (0. Male, 1. Female);
• Looks - Score ranging from 1 (not attractive) to 5 (attractive) (No unit);
Of particular interest is whether education has a statistical impact on people's salary or not. Moreover, this dataset will also enable us to investigate whether physical appearance plays a statistical role in the wage of a person.
Part I - Study on a small sample
We first pick at random from the dataset a small sample of n = 6 people and consider only the variables Educ (x) and Wage (y) for simplicity. The related data can be found in Table 1 and Table 2.
1. Use Table 1 and Table 2 to calculate the sample covariance cxy and the sample correlation rxy. What kind of relationship does it reveal?
2. Write the linear model associated to the linear regression of Wage on Educ. Use Table 2 and your result from the previous question to calculate the LSE (ˆb0, ˆb1).
3.Use Table 2 to calculate the coefficient of determination R2 of the regression.
Part II - Study on a large sample
Therefore, we now focus on a large sample of n = 706 people. Since the sample from Part I is very small, the numerical values are not reliable at all in practice.
1. We have reported in Figure 1 the scatter plot of Wage versus Educ with the line of best fit in red. In a few words, discuss the relationship between Wage and Educ according to the scatter plot. Is your conclusion the same as for Question I.1?
2. We consider the simple linear regression of Wage on Educ.
(a) Write the associated linear model. How many coefficients (including the intercept) do we have to estimate?
(b) Based on the results of the regression in Table 3, interpret in words the regression coefficient related to Educ and explain how hourly wage is affected by education. According to the model, what is the average impact of an additional year of education on hourly wage?
(c) Calculate a 95% confidence interval for b1. Is the relationship between Wage and Educ statistically significant? Explain.
(d) Do you find the estimated value for the intercept in Table 3 surprising? How would you interpret it? Use the reported p-value to assess the statistical significance of the intercept. Conclusion?
3. We consider the global regression of Wage on all the other variables. The results are documented in Table 4.
(a) Write the associated linear model. How many coefficients (including the intercept) do we have to estimate?
(b) Based on the results of the regression in Table 4, briefly interpret in words the regression coefficients related to each variable. Explain what is the meaning of the coefficient for the qualitative variable Female.
(c) Calculate the missing t-values in Table 4, and rank the variables according to their signifi- cances in the regression.
(d) Calculate the p-value for the variable Female (the p-value for the two-sided significance test). What are the significant variables in the regression? Explain. In particular, how does beauty affect wage? How do you explain this impact?
(e) Compare the R2 of the simple and the multiple model. Did we improve much the fit by adding Exper, Looks and Female in the regression? Explain.
(f) Predict the average hourly wage for a man who is such that: Educ = 12, Exper = 11, and Looks = 3.
(g) All things being equal, how many dollars per hour does a woman earn more/less than a man? Explain.
Part III - Tests
1. A study claims that the average hourly is $ 6.7. In the large sample of n = 706 people from Part II, we measured a sample average of $6.3 minutes, for a sample standard deviation of $4.7. Define the pair of hypotheses for the two-sided test of means, calculate the test statistic, its p-value, and finally run the test at significance level α = 5%. Do you agree with the study?
2. A study claims that exactly 50% of the population from which the dataset from Part II has been sampled are men. We measured a sample proportion of men of 55%. We consider the one-sided set of hypotheses H0 : p = 50% against H1 : p > 50% where p is the proportion of men. Calculate the related test statistic, its p-value, and finally run the test at significance level α = 1%. Do you agree with the study?
Attachment:- Statistics Final.rar