Reference no: EM131003837
Problem 1 - Good forecasting and control of reconstruction activities leads to more efficient use of time and resources in highway construction projects. Data on construction costs (in 61.000s) and person-hours of labor required on several projects are presented in the following table and are taken from the article. -Forecasting Engineering Manpower Requirements for Highway Reconstruction Activities" (Persad et al.. Journal of Management Engineering, 1995). Each value represents an average of several projects, and two outliers have been deleted.
Person-Hours (X) |
Cost (Y) |
939 |
251 |
5796 |
4690 |
289 |
124 |
283 |
294 |
138 |
138 |
2698 |
1385 |
663 |
345 |
1069 |
355 |
6945 |
5253 |
4159 |
1177 |
1266 |
802 |
1481 |
945 |
4716 |
2327 |
(a) Make a scatterplot (with a regression line) of cost versus person-hours. Present the least-squares line for predicting cost (y) from person-hours (x).
(b) Plot the residuals versus the fitted values. Does the model seem appropriate?
(c) Compute the least-squares line for predicting ln y from ln x, together with a new set of regression plots.
(d) Plot the residuals versus the fitted values. Does the model seem appropriate?
(e) Using the more appropriate model, construct a 95% prediction interval for the cost of a project that requires 1000 person-hours of labor.
Problem 2 - As we discussed in class, we can use a linear regression for a small data set (e.g., values on a short period of time. etc.). Again, think of a piecewise linear function - an approximation to a (known) curve can be done by sampling and interpolating linearly (so piecewise linear function) between the points! Attached find the Excel file, -STAT206AssignedCompanies." where you can find your student ID number together with a name of a company - so you have your own company to work on. They are the hottest companies in the US stock market in the first couple of months of 2016. (Selected among the most active stocks as of the 4th of March, 2016). You can easily obtain the data sets from Yahoo Finance. Google Finance. etc.
Using your company's data from July through December 2015,
(a) Fit a simple regression to predict your company's monthly stock return from its corresponding trading volume. What is the standard deviation of the residuals? What is R2?
(b) Now select another company in the same Industry (Any company you want to include would also be free!). Fit a simple matt on to predict your company's weekly stock returns from those of the other stock that you selected. What is the standard deviation of the residuals? What is R2?
Next,
(c) Using the regression you have just found in (a), carry out the predictions for January through February 2016 and compare to the actual data. What is the standard deviation of the prediction error? How can the comparison with the results from 2013 be explained?
(d) Using the regression you have just found in (b), carry out the predictions for January through February 2016 and compare to the actual data. What is the standard deviation of the prediction error? How can the comparison with the results from 2015 be explained?
Problem 3 - The file "flow-occ.csv" contains data collected by loop detectors at a particular location of eastbound Interstate SO in Sacramento, California, from March 14-20, 2003. For each of three lanes, the flow (the number of cars) and the occupancy (the percentage of time a car was over the loop) were recorded in successive five minute intervals. There were 1740 such five-minute intervals. Lane 1 is the farthest left lane, lane 2 is in the center, and lane 3 is the farthest right.
(a) For each station, plot flow and occupancy versus time. Explain the patterns you see. Can you deduce from the plots what the days of the week were?
(b) Compare the flows in the three lanes by making parallel boxplots. Which lane typically serves the most traffic?
(c) Examine the relationships of the flows in the three lanes by making scatterplots. Can you explain the patterns you see?
(d) Make histograms of the occupancies, varying the number of bins. What number of bins scans to give good representations for the shapes of the distributions? Are they any unusual features, and if so, how night they be explained?
(e) Make plots to support or refute the statement. "When one lane is congested, the others are, too."
Problem 4 - The file "bodytemp.csv" contains normal body temperature readings (degrees Fahrenheit) and heart rates (beats per minute) of 65 males (coded by 1) and 65 females (coded by 2).
(a) For both males and females, make scatterplots of heart rate versus body temperature. Comment on the relationship or lack thereof.
(b) Does the relationship for males appear to be the same as that for females? Examine this question graphically, by making a scatterplot showing both females and males and identifying females and males by different plotting symbols.
(c) For the males, fit a linear regression to predict heart rate from temperature. Plot the residuals versus temperature and comment on whether the relationship is linear. Find the estimated slope and its standard error.
(d) Repeat the above for females.
Attachment:- Assignment.zip