Reference no: EM132604592
MAS223 Applied Statistics - Murdoch University
In this assignment, we will examine the "Perth House Sales' and "Compiled- LabReports" datasets. The description of the Perth House Sales data is below in Appendix A and the Compiled Lab Reports data is described in Appendix B.
Question 1. To assess the consistency of a packaging machine in filling bottles with certified material, four bottles were produced in succession and the weight measured. Weights of the bottle (in g) were as follows:
1051.5 1049.8 1050.1 1052.0
By hand, calculate the jackknife estimator and standard error of the standard deviation of bottle weights (to at least three significant digits), showing all working. Is the estimator unbiased?
Question 2.
Consider the variable SALE VALUE in the "Perth House Sales" dataset, which records the sale price of some houses in Perth.
(a) Use appropriate graphical displays and measures of centrality and dispersion to summarise the SALE VALUE variable. Provide a rea- sonable explanation for why the SALE VALUE data might have the distribution you observe.
(b) For the most appropriate measure of centrality and measure of dispersion you have selected for SALE VALUE, produce a table of the form shown below that presents:
• the particular measures (i.e., statistics) you have chosen, those measures (i.e., statistics) as calculated for the variable SALE VALUE,
• the jackknife and bootstrap estimators for those statistics, the jackknife and bootstrap standard errors for those statis- tics, and
the jackknife and bootstrap estimates of bias for those statis- tics.
Do these measures of centrality and dispersion appear to be biased or unbiased estimators?
Measure of Centrality: Name of measure of centrality
Value of measure of centrality when applied to original data
|
Jackknife
|
a
Bootstrap
|
Estimator
|
|
|
Standard error
|
|
|
Bias
|
|
|
Measure of Dispersion: Name of measure of dispersion
|
Jackknife
|
a
Bootstrap
|
Estimator
|
|
|
Standard error
|
|
|
Bias
|
|
|
(c) Produce graphical displays of the sampling distributions of the measure of centrality and measure of dispersion you have selected for SALE VALUE. Comment on the shapes of these distributions. Additionally, produce a 95% bootstrap percentile confidence in- terval for both your measure of centrality and measure of disper- sion and interpret them. If there is anything unusual about the 95% bootstrap percentile confidence intervals, comment on that.
Question 3. A mining company has taken a large sample from a potential new mine site and sent some of the material to 10 laboratories. Now consider the relationship between gold detected (GOLD) and laboratory (LABID).
(a) Clearly and accurately state the
• linearity,
• independence,
• normality, and
• equal variances (i.e., homoscedasticity)
assumptions of linear regression as they pertain to these data, and assess them for a linear model of GOLD on LABID. This assessment should include reference to appropriate graphical displays.
(b) Consider common transformations of the data and present the form of the linear model which you believe would be best when attempting to assess the relationship between GOLD and LABID. Present and discuss relevant diagnostic plots for assessing the as- sumptions of linear regression for this model, clearly noting any violations of assumptions that may still exist.
(c) Assuming that the model presented in Part (b) is wholly appro- priate (i.e., there are no violations of the assumptions of linear regression), provide a table of relevant R output for that model and comment on whether there is an difference in the level of gold detected by the laboratories. If there is an effect, interpret this "effect" and produce a 95% confidence interval to accompany your estimate of the "effect." Note you can change the reference laboratory if you feel it would aid interpretation.
Question 4. Presentation marks:
These marks are allocated based on:
• structure, clarity, and tidiness of presented solutions/answers,
• correctness in spelling and grammar, and
readability of R code (which includes usage of informative variable names and commenting).
Attachment:- Applied Statistics.rar