Reference no: EM133035626
Data Analytics for Management
Question 1. Airline Overbooking
Airlines often sell more tickets on a flight than there are available seats. The reason for this is that a certain percentage of ticketed passengers tend to cancel their flight at the last minute. However, as some of you might have experienced, it can happen that there are not enough seats available for everyone showing up. This situation can be costly for the airline.
Play Air is an Icelandic low cost airline. Most of their planes have 200 seats. The airline would like to understand the implication of selling more tickets than there are seats. Given that Icelanders have become more reliable (and more eager to get away) after the economic crisis the no-show probability is estimated to be relatively low or 5.5%.
The so-called binomial distribution can be used to describe uncertain situations of this kind where there are two possible outcomes; a passenger shows up or does not show up. Fortunately, the binomial distribution can be approximated by the normal distribution such that the number of passengers that show up is normally distributed with a mean of n·p and a standard deviation of √n . p . (1- p) where n is the number of tickets sold and p is the probability of showing up.
a) If the airline sells 209 tickets what is the probability that there will be an overbooking situation?
b) If the airline sells 216 tickets what is the probability of having empty seats?
c) If the airline wants only 4% probability of an overbooking situation, how many tickets should it sell?
d) You have been hired as a consultant for Play Air. You have understood the probability structure of the overbooking problem and the next step is to advise them on how many tickets to sell. Please make a list of the data you would need to get from Play Air in order make your recommendation.
Question 2. Builders Election
The builders section of the Viking Workers Union have an upcoming vote on whether to exit the Union or not. There is a movement among builders within the Union for starting their own union, which would be more focused on the needs of builders. To get a sense of the outcome of the elections the leader of the pro-exit rally hired a polling company to carry out a survey. The survey was performed two days before the elections and 500 likely voters were asked. The result was that 255 would like to exit the VW Union, 225 would vote for remain and 20 were undecided.
a) The polling company was debating between doing an online survey and a telephone survey. Would you recommend one approach over the other? Provide short reasoning.
b) Based on the outcome of the survey would you expect the exit to go ahead?
c) The builders that participated in the survey were not only asked about how they would cast their vote but also about their experience as builders. It turns out that pro-exit builders have on average 7.2 years of experience (with a sample standard deviation of 3.1) while the average for the pro-remain builders is 5.9 years (with a sample standard deviation of 2.9). Do you agree with the statement that pro-exit builders are more experienced?
Question 3. Burnout
Side Control is a fast growing company. Side Control is trying to hire new employees but is having a hard time finding the right ones. Therefore, the pressure on the current employees has been building up. The HR director has noticed an increased number of people suffering from "burnout", in which physical and emotional fatigue hurt job performance. Although the job pressure cannot really be changed at the moment, the HR director suspects that the more time a person spends socializing with coworkers away from work, the more likely a burnout is. With the help of a human resources lab at the local university, the HR director has administered a questionnaire to the employees. A burnout index has been computed based on the responses from the survey as well as a quantitative measure of socialization. The data is listed in the file Side Control.xlsx with 1 used for males and 0 for females and the socialization measure is recorded based on the number of social contact hours with work colleagues out of the office per quarter.
a) Discuss briefly the important issues to have in mind when designing the survey.
b) Build and analyse a regression model that can be used to forecast the burnout index value for individuals.
c) Based on the regression model developed in Part b), write down one regression equation for each gender. Is there a significant difference between female and male burnout index values? Are women or men more likely to experience a burnout?
d) What is the impact of each additional quarterly bar trip (1 hour long) with colleagues, on the burnout index?
e) Assume a likely burnout is defined as an index value of 750 and higher. Does a man who spends usually 4.5 hours per week after work with colleagues, run the risk of a burnout?
Question 4. Cash Withdrawal
A bank wants to analyse what variables have an effect on the amount of cash withdrawn from automatic teller machines (ATMs) located in residential neighbourhoods. A sample of total daily withdrawals from ATMs has been collected, together with information suspected to affect withdrawals. This information includes the median value of homes in the neighbourhood, the median family income in the neighbourhood, the average checking balance of customers in the neighbourhood, the distance to the next nearest ATM, and whether or not the withdrawals occurred on a weekend. A part of the data set is given in Table 1.
Table 1. Cash Withdrawal data
Total Daily
Withdrawals (£1,000s)
|
Median Home
Value (£1,000s)
|
Median
Income (£1,000s)
|
Average Checking
Balance (£1,000s)
|
Distance to Next ATM
(miles)
|
Weekend
|
107
|
154
|
55.9
|
1690
|
4.6
|
No
|
56
|
82
|
30.6
|
891
|
4.4
|
No
|
50
|
28
|
4.5
|
60
|
4.1
|
Yes
|
135
|
150
|
56.7
|
1651
|
1.4
|
No
|
56
|
103
|
32.5
|
914
|
4.5
|
No
|
?
|
?
|
?
|
?
|
?
|
?
|
33
|
137
|
94.7
|
1164
|
7.4
|
No
|
105
|
103
|
90.1
|
1093
|
4.3
|
Yes
|
68
|
124
|
98
|
1269
|
6.1
|
No
|
126
|
133
|
116.6
|
1762
|
4.3
|
No
|
71
|
174
|
95.8
|
1483
|
5.8
|
No
|
Table 2 contains the correlation coefficients between the different variables, where a dummy variable is used to indicate whether the withdrawal occurred on a weekend (1) or not (0). Also scatter plots between all the independent variables and total withdrawals are given.
Table 2. Correlation matrix
Withdrawals Med.
|
Home Value Med
|
. Income Avg. Chec
|
king Balance Distance
|
Next ATM Weekend
|
Withdrawals
|
1.00
|
|
|
|
|
|
Med. Home Value
|
0.56
|
1.00
|
|
|
|
|
Med. Income
|
0.45
|
0.52
|
1.00
|
|
|
|
Avg. Checking Balance
|
0.76
|
0.84
|
0.63
|
1.00
|
|
|
Distance Next ATM
|
-0.51
|
-0.06
|
0.26
|
-0.15
|
1.00
|
|
Weekend
|
0.21
|
-0.30
|
0.09
|
-0.20
|
0.09
|
1.00
|
a) Explain in practical terms the meaning of the correlation coefficient between the ‘Distance to Next ATM' variable and the total withdrawals.
A multiple regression analysis was performed, with the results given in Table 3.
Table 3. Multiple Regression Results
Regression Statistic
Multiple R
|
s
0.95
|
|
R Square
|
0.91
|
Adjusted R Square
|
0.89
|
Standard Error
|
14.72
|
Observations
|
50
|
|
|
Coefficients
|
Standard Error t Stat P-value Lower 95% Upper 95%
|
Intercept
|
69.527
|
10.118
|
6.87
|
0.00
|
49.135
|
89.919
|
Median Home Value
|
0.004
|
0.100
|
0.04
|
0.97
|
-0.198
|
0.206
|
Median Income
|
0.105
|
0.096
|
1.09
|
0.28
|
-0.089
|
0.298
|
Average Checking Balance
|
0.048
|
0.007
|
7.08
|
0.00
|
0.034
|
0.062
|
Distance to Next ATM
|
-12.076
|
1.400
|
-8.63
|
0.00
|
-14.898
|
-9.254
|
Dummy Weekend
|
35.163
|
4.631
|
7.59
|
0.00
|
25.829
|
44.497
|
b) Do the results suggest that there is a relationship between the median family income in the neighbourhood and total ATM withdrawals in that neighbourhood?
Based on the results in Table 3, a new regression model was run. The results are given in Table 4.
Table 4. Multiple Regression Results - Revised Model
Regression Statis
Multiple R
|
tics
0.95
|
|
R Square
|
0.90
|
Adjusted R Square
|
0.90
|
Standard Error
|
14.59
|
Observations
|
50
|
|
|
|
Coefficients
|
Standard Error
|
t Stat
|
P-value Lower 95% Upper 95%
|
Intercept
|
69.171
|
7.646
|
9.05
|
0.00
|
53.780
|
84.561
|
Average Checking Balance
|
0.052
|
0.003
|
16.50
|
0.00
|
0.046
|
0.058
|
Distance to Next ATM
|
-11.380
|
1.223
|
-9.31
|
0.00
|
-13.841
|
-8.919
|
Dummy Weekend
|
36.547
|
4.252
|
8.59
|
0.00
|
27.988
|
45.106
|
c) For forecasting purposes, would you prefer the first model (with the results in Table 3), or the second model (with the results in Table 4)? Why?
Below, the three residual (error) plots for the revised model are given.
d) Do any of the residual plots highlight any problems with the regression analysis? If yes, interpret them and discuss how these could be resolved.
e) Provide a forecast for the total withdrawals on a weekday from an ATM located 5 miles from the nearest other ATM, in a neighbourhood where the average checking balance of the customers living in that neighbourhood is 1,500, the median value of the homes is 150 and the median family income is 100.