Describe data and discussing any interesting features

Assignment Help Basic Statistics
Reference no: EM132583667

Questions -

Q1. A group of senior citizens who have never used the internet before are given training over a period of 6 months. A sample of 3 of them is chosen at random and their numbers of hours of internet use are recorded for the 6 months, as shown in Figure 1.

(i) Describe briefly the data, discussing any interesting features. Based on Figure 1 only suggest the form of a possible linear model of the hours of use per month (as response variable) and month (as explanatory variable).

(ii) Let y be the hours of use per month and x be the month. An analysis in R gave the following output: (see attached file)

(a) Write down the fitted model.

(b) Comment on the model and the quality of its goodness of fit, making appropriate reference to any goodness of fit diagnostics. State clearly any hypothesis you may use.

(c) Using one of the following R extracts

> qnorm(0.95)  > qt(0.95, df=14)

[1] 1.644854  [1] 1.76131

> qt(0.95, df=15)  > qt(0.995, df=15)

[1] 1.75305  [1] 2.946713

calculate 90% confidence intervals for the coefficient of x and for the coefficient of x2.

(d) For month x = 1 calculate a 90% predictive interval for the future observation y. You may use the following:

918_figure.jpg

where X is the design matrix of the linear model.

(e) A further R analysis gave. Calculate the correlation coefficient of the estimator of the gradient (coefficient of x) and the estimator of the coefficient of x2.

Q2. A data-set on black cherry trees in the Allegheny National Forest, Pennsylvania, USA includes the height, radius (measured 4.5 feet above the ground) and volume, for each of 31 trees.

(i) A model vi = β0 + β1ri + β2hi + ∈i (1)

has been proposed, where hi, ri, vi are the natural logarithms of the height (in feet), radius (in feet) and volume (in cubic feet) of the ith tree, and ∈i ~ 1 N(0, σ2) independently for different trees. The following output summarizes the results of fitting this model in R.

Explain the hypothesis being tested by each of the three F statistics included in the output. What interpretation, if any, can be placed on their conclusions here?

(ii) Figure 2 shows the standardized deletion residuals for the model above. The following calculations can be used as the basis of a test on the standardized deletion residuals, using the Sidak correction. >alpha=0.05

> prob=1-(1-alpha)-(1/31)

> qt(prob/2,27)

[1] -3.495321

Explain the interpretation of the values alpha and prob used in the calculation, and carry out the test.

(iii) Thinking about the trunk of each tree as a cylinder, a simple geometric calculation suggests that

Vi ≈ kRi2Hi (2)

where Vi = exp(vi) etc., and that k ≈ π (the usual circular constant). Explain why the model suggested by (2) can be represented as a special case of (1) under the null hypothesis that β1 = 2 and β2 = 1, and explain how that null hypothesis can be written in the general form

Cβ = c.

Express the weaker hypothesis that β1 + β2 = 3 in a similar form, and calculate the corresponding F statistic, using the fact that

1240_figure1.jpg

What is the null distribution of this F statistic?

Q3. (i) A laboratory experiment is intended to investigate the effect of a drug on certain species of micro-organisms. Tissue cultures containing set amounts of one of three species of micro-organisms (A, B, C) are each exposed to doses of the drug being tested; there are four different doses used, and two replicates of each combination of species and dose. Figure 3 shows a plot produced in R of the dose and response for each run, the points being coded by species.

Various models are being considered for the response as a function of species and dose. The output below shows summaries of results for two models; Response and Species have the obvious meaning, NumDose refers to the dose as a quantitative variable, and FacDose refers to the dose as a factor variable.

(a) Give the equations for these two models, explaining your notation and assumptions.

(b) Calculate the BIC for each of these two models. Based on the BIC, explain which of the two models you would prefer and why.

(c) What advantages and disadvantages do these two modelling approaches-dose as a factor, and dose as a numerical variable-have for this experiment, beyond those taken into account in the BIC?

(ii) Consider the linear model

yi = xiTβ + ∈i,  i = 1, 2, . . . , n, (3)

where ∈i is an i.i.d. sequence of random variables with zero mean and variance Var(∈i) = σ2ci, for some variance σ2 and ci > 0.

Discounted least squares considers the maximum likelihood estimator β^ of β, which minimises the discounted sum of squares

Sδ(β) = i=1nδn-i(yi - xiTβ)2,

for some discount factor δ that satisfies 0 < δ ≤ 1.

(a) Show that discounted least squares is a special case of weighted least squares (WLS) and calculate the weights of WLS as functions of δ.

(b) Using the relationship of discounted least squares and WLS as in (a), derive the variance of ∈i as a function of σ2 and δ.

(c) For the simple linear regression model with no intercept and a near constant covariate xi ≈ x, i.e.

yi ≈ xβ + ∈i,  

show that

β^ = ((1- δ)/x(1- δn))i=1nδn-1yi.

Attachment:- Statistics Assignment File.rar

Reference no: EM132583667

Questions Cloud

Attract the support of a senior manager : Provide THREE (3) important benefits of a policy or procedural change that could attract the support of a senior manager.
Nervous about giving speech : Someone has come to you for advice about the best way to deliver their speech, but they are very nervous about giving a speech
Which traits do you believe will inspire others : Provide a 500 word summary that further explains your leadership philosophy based on leadership models and theories of your choosing.
What are the required steps training for staff to manage : What are the required steps training for staff to manage volunteers, distinguish roles, and facilitate social interactions.
Describe data and discussing any interesting features : Describe briefly the data, discussing any interesting features. Based on Figure 1 only suggest the form of a possible linear model of the hours of use
Discuss the etiology : Discuss the etiology. What can cause this injury/condition to occur? How is this injury/condition treated?
What types of reports and data would a manager use : What types of reports and data would a manager use to form an historical point of view of the store's/company's performance?
Approaches surrounding collection and analysis of data : Discuss the differences between the three major approaches surrounding collection and analysis of data, i,e., quantitative, qualitative, and mixed methods
Determine how does exercising christian principles play : Determine How does exercising Christian principles play a part in running a successful business while operating within state and federal regulations?

Reviews

Write a Review

Basic Statistics Questions & Answers

  Conditions for a binomial experiment of random variables

Explain which of the conditions for a binomial experiment is not met for each of the following random variables.

  Reporting by scarborough research

According to a report by Scarborough Research, the average monthly household cellular phone bill is $73. Suppose local monthly household cell phone bills are normally distributed with a standard deviation of $11.35.

  Consider an investment portfolio of 50000 in stock a and

question consider an investment portfolio of 50000 in stock a and 50000 in stock b. the expected value of a is 9.5

  Construct a 95 confidence interval estimate for the

q1. when do americans decide what to make for dinner? an outline survey indicated that 74 of americans decided either

  Explain impact on the statistics

Accidentally typed in error as a 415, what would be the impact on the following statistics? Answer with increase, decrease, or remain the same.

  Determine the risk as measured by standard deviation

An investment will pay $105 with probability 0.7, and $125 with probability 0.3. Find the risk (as measured by standard deviation) for this investment.

  Thinking about data collection plans

You don't need to collect data right now, but you should start thinking about data collection plans that give data that is random and representative.

  Confidence interval for the mean amount of time

How would I determine and interpret a 99?% confidence interval for the mean amount of time Americans age 15 or older spend eating and drinking each day.

  Predict the price of a tire with a warranty length

The following table gives information on the limited tread warranties (in thousands of miles) and the prices of 12 randomly selected tires at a national tire.

  Calibrate a regression model of the form of y a b loge

trip rate y1.54.02.12.64.82.02.43.31.92.0res. density x425251041520121422calibrate a regression model of the form of y

  Create interval for the true average age of the consumers

The average age in the sample was 22.5 years with a standard deviation of 5 years. Construct a 95% confidence interval for the true average age of the consumers.

  You are picking a bouquet of 20 flowers for your mother at

you are picking a bouquet of 20 flowers for your mother at random from a garden with 25 coneflowers 35 daisies and 42

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd