Reference no: EM133129945
STAT 6950 Statistics for the Life Sciences - University of Guelph
Question 1. Sokal and Rohlf (1995; Table 16.4; unpublished data of R. K. Kuehn; data set slightly modified for this assignment) presented a data set where the response variable Y, the angular transformed frequency of the Lap94 allele in the mollusc Mytilus edulis is to be modelled using polynomial regression as a function of X, the distance off the coast in miles east of Southport, Connecticut. [The angular transformation used was Y (for analysis) = Arcsin[(Frequency of allele Lap94)1/2].
(a) Obtain polynomial models from linear up to and including order 5.
(b) Obtain one graph with the estimated linear, quadratic, and cubic model equations fit to the data. Label appropriately and include a legend. This will be handed in.
(c) Test the null hypothesis that the order 4 and order 5 terms do not contribute to the model (given that terms up to order 3 are already in the model) at the 5% level of significance.
(d) Is it better to use a cubic model than a quadratic model? How do you know this?
Question 2. In class we analyzed a diamond data set for which the diamonds were priced in Singapore dollars. We conducted some analyses in class after logarithmically transforming the response variable "Price" by a log (base 10) transformation; Carat was not transformed. You will perform some similar analyses on a different data set that I extracted data from the Canada Diamonds Inc. website. The data set you will use for this assignment is restricted to five shapes of non-round diamonds. The data set has been "cleaned" to remove observations that had missing values for Colour, Clarity, or Shape. We will ignore the variable "Cut".
(a) The most important predictor variable of price is diamond size; this can be confirmed readily. Create scatterplots of Price on Carat, LogPrice on Carat, and LogPrice on LogCarat, where logged variables are created by taking base 10 logarithms of the original variable. Now consider the effects of the transformations on the scatterplots. We'll continue in our analyses by working with both LogPrice and LogCarat, why would we choose to do so?
(b) With LogPrice as the response variable, fit the following models with the predictor variables specified:
(1) LogCarat (linear model)
(2) LogCarat (quadratic model)
(3) LogCarat (quadratic model) + Shape
(4) LogCarat (quadratic model) + Colour
(5) LogCarat (quadratic model) + Clarity
(6) LogCarat (quadratic model) + Colour + Clarity
(7) LogCarat (quadratic model) + Colour + Clarity + Shape
Use the "summary" and "anova" functions to obtain results for each of the models fit. Fill in the table (attached, last page) with selected key values.
(c) What evidence allows us to conclude that a quadratic term (model (2)) for LogCarat improves the model in comparison to the simple linear regression model (model (1))? What is the increase in R2 and what is the F-value and associated p-value for the appropriate hypothesis test?
(d) Based on the analyses of models (3) to (5), we next ran models (6) and (7). Why was Colour the first variable added to the quadratic model? Do you think Clarity and Shape should have been entered in a different order? Briefly explain your reasoning.
(e) Is Shape a useful variable to add to the quadratic model if Colour and Clarity are also in the model? Briefly explain, supporting your argument with evidence provided by appropriate hypothesis test(s) and associated p-value(s). (You do not need to formally state the null and alternative hypotheses).
(f) Use models (6) and (7) to predict the price (in Canadian dollars) of a 1.2 carat "Oval" diamond with colour F and clarity VS1. Your solutions should show the appropriate substitutions into the estimated regression equations.
(g) Which diamond has the largest residual (in absolute value) for Mod7?
(h) Using Mod7, which diamond has the largest difference between the actual and predicted price? Show how you arrived at your answer.
(i) Obtain a side-by-side boxplot of the Mod6 residuals by Shape. How does this graph reflect the results of the summary output for Mod7? (Hand your graph in as well as your answer.)
Attachment:- Statistics for the Life Sciences.rar
What are three significant issues
: Based on your community, what are three significant issues that need to be addressed while communicating with stakeholders
|
How much is the liquidating dividend
: On January 1, 2021, the board of directors of Goby Inc. declared a $580,000 dividend. How much is the liquidating dividend
|
Influence of foreign culture on organizational management
: Based on what you learned in this lesson regarding the influence of foreign culture on organizational management, are you, more or less, interested in getting a
|
Is the currency you selected over valued or under valued
: Is the currency you selected over valued or under valued? Support your view with the data you developed using the steps above
|
STAT 6950 Statistics for the Life Sciences Assignment
: STAT 6950 Statistics for the Life Sciences Assignment Help and Solution, University of Guelph - Assessment Writing Service
|
What is the maximum amount an individual
: Assume an interest rate of 5%. What is the maximum amount an individual would be willing to give up today in exchange for $1, paid 30 years in the future? Round
|
What is the minimum interest rate
: If corporate bonds are traded 4% above the government bond rate of 8% and the recovery rate on default loans is 50%. What is the minimum interest rate
|
Calculate the equilibrium levels of the interest rate
: a) Assume the economy is closed, calculate the equilibrium levels of the interest rate, savings, investment and current account.
|
Calculate the accounting profit
: Imagine you are an economics professor, and you want to go into business for yourself and operate a burger joint. The following table details the projections th
|