Reference no: EM132210049
Assignment - Need is 10 pages of solutions with description.
Learning objectives -
1: Demonstrate a practical understanding of core quantitative data analysis methods in data science applications and research.
2: Demonstrate skills in implementing these methods on real data using a software package and in critically evaluating and interpreting the results.
3: Evaluate the strength and the weaknesses of quantitative analysis methods alongside an understanding of how and when to use or combine methods.
Assignment - The data contains 3000 observations on the following 11 variables.
year - Year that wage information was recorded.
age - Age of worker.
maritl - A factor with five levels indicating marital status 1. Never Married 2. Married 3. Widowed 4. Divorced and 5. Separated.
race - A factor indicating race with levels 1. White 2. Black 3. Asian and 4. Other
education - A factor indicating education level with levels 1. < HS Grad 2. HS Grad 3. Some College 4. College Grad and 5. Advanced Degree indicating education level
region - Region of the country (mid-atlantic only).
jobclass - A factor indicating type of job with levels 1. Industrial and 2. Information.
health - A factor indicating health level of worker with levels 1. <=Good and 2. >=Very Good.
health_ins - A factor indicating whether worker has health insurance with levels 1. Yes and 2. No.
logwage - Log of workers wage.
wage - Workers raw wage.
1. Explore the data. Plot and produce summary statistics to identify the key characteristics of the data (for some of the variables listed above) and produce a report of your findings. 5 - 10 tables or figures are expected accompanied by a description of your main findings. The topics that you might choose to discuss include: possible issues with the data collection, identification of possible outliers or mistakes in the data, role of missing data (if any), distribution of the variables provided, relationships between variables.
2. What are the pairwise associations between variables in the dataset? Use correlation analysis, scatter plots, box plots, and a chi-squared test to test for associations between pairs. You can choose 3-4 associations to test for. What are the underlying assumptions of the statistical test that you applied? Are the assumptions satisfied? What do these test results mean?
3. Use multiple linear regressions to establish which variables affect the level of wages. Why one could focus on predicting log-wage, and not directly wage? Which variables can be used to predict wages?
1. Carry out a descriptive analysis and draw plots aimed at finding the answer to the question above.
2. Perform a multiple linear regression of logwage on some or all of the other variables.
3. Discuss the interpretation of the results and check the residuals plot. Discuss any weakness of this analysis and its effectiveness to answer the question above.
Attachment:- Data File.rar