Reference no: EM133425795
Question: Information about the dataset: The data is financial data for 97 companies on the S&P 500, so this can be thought of as a simple random sample of all the stocks listed on the S&P 500.The first two columns are the Company Name and ticker symbol. For the company name column, there is an icon that will allow you to add/edit info. Do not do that, these names are only here for your info, consider the rest of the values "fixed" (some this month, some back in November, intentionally so that it is a unique dataset)
The dependent variable (Column C) we will use will be "52 week high". This should be the dependent variable of all three regressions and represents the highest price the stock price has reached in the last 52 weeks (year). Column D is the Market Capitalization of the company, in Billions of Dollars. Column E is the trading volume (how many shares are traded) divided by 100,000. Column F is the number of shares outstanding (in millions). Column G is the Region (USA) that the company has its headquarters. Column H is the number of employees (in Thousands) that each company has.
Regression 1: Run a regression with 52 week high as the dependent variable, and market cap, volume, and shares outstanding as the independent variables. Include this regression in excel as a separate worksheet and make sure it is clearly labeled "Regression 1".
A) Interpret ALL of the slope coefficients (but not the intercept, be specific)
B) Which variables are statistically significant? How do you know? Provide at least two pieces of evidence.
C) Which variables are not statistically significant? How do you know? Provide at least two pieces of evidence.
D) What is the predicted 52 week high price from this model, for a company that has a market cap of 50 billion dollars, trades 1 million shares a day, and has 600 million shares outstanding. Be careful to look at the headings and the description to see how each column is expressed (billions vs millions vs. 100,000s).
Regression 2: Make a dummy/binary variable equal to 1 if the company has more than (or equal to) 40,000 employees and equal to 0 if they have less than 40,000 employees. Note that the employee variable is noted in thousands of employees, so the value 9.4 means the company has 9,400 employees. Run a regression with the same dependent and independent variables as regression 1, but now add your
new binary variable called "40KPlus".
E) Interpret the new binary slope coefficient (be specific).
F) Is the new variable statistically significant? How do you know? Provide at least two pieces of evidence?
G) Does this new model have a higher ADJUSTED R-Squared than Regression 1? What does this mean?
Regression 3: Make three dummy variables based on the Region of Headquarters variable. One for Midwest, one for South, and one for Northeast. West will be the reference variable. This IF coding will be a little more complex since you are dealing with words, you need to use quotes, and to make sure that there aren't any weird spacing issues. For example "South" would only work if the value in column G is not "South " with an extra space (I tried to make sure this wasn't the case). Run a new regression including all of the variables from Regression 2 and adding the three new binary/dummy variables you just created based on the Year Incorporated variable.
H) Interpret the slopes on all three of your new binary/dummy variables?
I) Which variables are statistically significant (if any)? How do you know?
J) Name one comparison (in terms of the dummy variables) that you are NOT able to make, in terms of statistical significance, and state why.
K) Conduct a partial F-Test to check the joint significance of the three new variables? Are the three new variables combined statistically significant? How do you know?