Reference no: EM133203224
Assignment:
For the final project your task is to collect data and estimate a regression model using real-world data that you retrieve from the internet. This project should not take too long, and, after we have completed the 10th computer lab, we will devote class time to working on the project.
The subject of your regression model can be anything you find interesting -- economics, the pandemic, climate change, sports, entertainment, etc. The project has three main steps:
1.) Come up with a question/topic of interest.
2.) Collect the raw data to help answer your question.
3.) Estimate the regression model and interpret the results.
You will submit your data, estimation results, and writeup as an Excel file to Canvas when you are done.
Step 1
Defining your question of interest is entirely up to you. As already mentioned, it does not necessarily have to be related to economics. Here are some example questions that might give you ideas for topics that would be suitable for this project:
• How does the price of gasoline affect the number of miles people drive in their carsin a given year?
• Do the economies of countries with corrupt governments tend to perform worse than countries with a strong rule of law?
• How do Major League Baseball team salaries affect the number of games teams win in a particular season?
• What is the relationship between the death rate from air pollution and GDP per capita for countries on the African continent in 2015?
• How does engine horsepower effectfuel-efficiency ratings for automobiles?
If you have trouble determining the question you want to study, please contact me and I can help you find a suitable topic. Note: when you are coming up with a question of interest, remember that some types of data may be difficult to acquire because they are either confidential (e.g., individuals' health records) or proprietary (e.g., number of people who watched a particular show on Netflix).
Step 2
You will need to find or construct a data set with one dependent variable (y) and at least one independent variable (x) that will help you answer your question of interest. For this project, you do not need to have a giant amount of data, but your data set should have at least 30 observations. If you are confident in working with bigger data sets, then go ahead and use a large data set. Below I will give some examples of places to look for data that can easily be imported into Excel. If you encounter difficulty finding the data to use in this project, you can also contact me for ideas.
Here are some places on the internet with easily accessible data in the Excel format:
- (Lots of US government data, just enter a search term)
- (US Federal Reserve economic data)
- (US Census data)
- (Various data sets across many subjects)
- (Pandemic data)
Step 3
Once you have your data in hand you will analyze it as we have done in numerous Excel labs this semester.
a. Make a scatterplot of your data and fit the regression line.
b. Use the regression tool in Excel's Data Analysis toolkit to estimate your model.
c. Report the estimated coefficients for the intercept and slope parameters, and indicate which of them are statistically significant. Also report the R-squared value.
d. Interpret the meaning of the estimated slope coefficient(s). In other words, how does a 1-unit increase in the independent variable (x) affect the dependent variable (y)?
e. Make a conclusion based on your statistics. Do you find the results convincing?
Your writeup should be included in the Excel Spreadsheet you submit on Canvas. To address items c., d., and e. above, use complete sentences and proper grammar. I am not looking for an essay, so you can just put your answers into one or more blank cells in your Excel worksheet. Just make sure that it is readable. You may need to adjust row and column sizes to make a cell big enough to hold your answers.