In the time series plot and scatter graphs there were many outliers that were clearly visible. These have been removed to identify if they were influential or had high leverage and in order to see if the multiple regression model assumptions have been met.
Below are the rows of the outliers that I removed out of the 1519 observations:
77, 674, 448, 757, 317, 549, 1187, 1198, 26, 456, 405, 307, 1205, 1348, 611, 368, 309
Best Subsets Regression: wfood versus totexp, income, age, nk
Response is wfood
t i
o n
t c
e o a
Mallows x m g n
Vars R-Sq R-Sq(adj) Cp S p e e k
1 22.9 22.9 67.4 0.092326 X
1 5.5 5.4 424.9 0.10222 X
2 24.8 24.7 31.3 0.091236 X X
2 24.2 24.1 42.7 0.091572 X X
3 26.1 26.0 6.1 0.090461 X X X
3 24.8 24.7 32.3 0.091239 X X X
4 26.3 26.1 5.0 0.090397 X X X X
The best subset is a way of identifying which independent variable such as the totexp, income, age and nk are best suited to the regression model. According to the results above income is the variable that has the highest Cp and the lowest R-squared value therefore it will be the variable that will be dropped to see if the data fits the model.