Reference no: EM132658819
MAS223 Applied Statistics - Murdoch University
Background
After a horse race, horses are assessed for injuries. Each horse is determined to have no injuries, minor injuries or moderate or severe injuries. They are then provided suitable treatment, or in extreme cases, retired from horse racing.
Our research is considering the association between characteristics of each horse and the race conditions and injuries. We have information on the horses (weight carried, place finish, distance from the winner, the maximum race speed, age, sex, how many starts they have made in the preparation and the number of days in the racing preparation) and the race (track, race distance, air temperature at the time of the race, number of starters).
Question 1.
Carry out a principal component analysis using all the suitable vari- ables. Do not include outcome.
(a)Which variables have been selected and why?
(b)Consider the eigenvalues for the principal component analysis.
Provide evidence to support your answer.
How many principal components would you select if using the "elbow" method?
How many principal components would you select if attempt- ing to account for 70% of total variation?
How many principal components would you select if using 1 as a cut-off?
(c) Produce a biplot of the first two principal components. What vari- able groupings load onto these components similarly?
(d) What are the percentage contributions of all the variables to PC1?
Question 2.
(a) Discuss the assumptions of linear discriminant analysis as they relate to this data set.
(b) Using linear discriminant analysis, determine the hit rate when considering the variables used above in question one as explana- tory variables in trying to predict the outcome.
(c) How does the hit rate change if we consider using the principal components (from question one) as explanatory variables in the linear discriminant analysis?
(d) For the approach with the better hit rate, using the group means, describe the three outcomes and how they typically differ.
(e) How does this change if we say that the costs of mis-diagnosing the minor injuries are 20 times that of no injury and the costs of mis-diagnosing the moderate or severe injury are 100 times that of no injury?
i. What are the new priors? (2 marks) ii.What is the new hit rate?
(f) Is linear discriminant analysis effective in this context? Provide at least one visualisation to support your answer.
Note: Be sure to remove the randomness from the linear discriminant analysis analysis by setting the seed in your Rcode.
Question 3.
In your own words, for principal components regression, outline the following:
(a) The type of question for which principal component regression may be suitable. Include a discussion on the suitable types of data.
(b) Any assumptions and how you would check them in practice. (c)How you would decide on the number of components to use in
running a principal components regression.
(d) The output you would provide an interested party including the types of visualisation, that would support your analysis.
Question 4.
Report presentation marks
These marks are allocated based on:
• structure, clarity, and tidiness of presented solutions/answers,
• correctness in spelling and grammar, and
Attachment:- Applied Statistics.rar