Reference no: EM132392816
Model Building Homework
1. What do you think is the cause for the days with the highest and lowest residual values? Look at all days with abs(resid) > 80.
term <- function(date) {
cut(date,
breaks = ymd(20130101, 20130301, 20130605, 20130825, 20140101),
labels = c("winter","spring", "summer", "fall")
)
}
# add day of week and term to count by date
daily <- flights %>%
mutate(date = make_date(year, month, day)) %>%
count(date) %>%
mutate(wday = wday(date, label = TRUE)) %>%
mutate(term = term(date))
mod1 <- lm(n ~ wday * term, data = daily)
daily_res <- daily %>%
add_residuals(mod1, "resid")
2. Create a new variable that splits the wday variable into terms (seasons), but only for Saturdays, i.e. it should have Sun, Mon, Tue, Wed, Thu, Fri, but Sat-winter, Sat-summer, Sat-spring, Sat-fall. Use cutoff dates of March 1, June 5, Aug 25 to separate into seasons. How does this model compare with the model with every combination of wday and term? Plot both model residuals side by side.
3. Create a new variable that combines the day of week, term (for Saturdays), and public holidays that you identified in number 1. So, the possible values of that variable will be Sun, Mon, Tues, Wed, Thu, Fri, Sat - winter, Sat-summer, Sat - spring, Sat - fall, holiday. What do the residuals of that model look like? Do this first as one factor called holiday for all the dates with large absolute value residuals. Then do it again with one factor for the high residual dates and another for the low residual dates. (This second model will have possible values of Sun, Mon, Tues, Wed, Thu, Fri, Sat - winter, Sat-summer, Sat - spring, Sat - fall, holiday-high, holiday-low. Which model works better?
4. Create a variable that contains the day of the week unless it is one of the two types you identified in the second part of number 3. So, it will contain one of the values Sun, Mon, Tues, Wed, Thu, Fri, Sat, holiday-high, holiday-low. Produce a model that models n based on both that variable and the term as well as their interaction. Plot the residuals.
5. Use what you have learned above to predict the number of flights for 2020 per day. Print a graph that overlays the number of flights in 2013 with your number of predicted flights in 2020. How many flights do you predict for each day June 20 - July 10 of 2020?
Attachment:- Assignment Instructions.rar