Calculate the sample variance for Y

Assignment Help Applied Statistics
Reference no: EM132263569

Homework -

Q1. In a small-scale experimental study of the relation between degree of brand preference (Y), moisture content (X1), and sweetness (X2) of the product, results were obtained from an experiment based on a completely randomized design. In what follows, you are to treat Y as the dependent variable and X1, X2, and any functions of X1 and X2 as independent variables. The following full model was fit to the data

Yi = β0 + β1X1i + β2X2i + β3(X1i ? X2i) + β4X1i2 + β5X2i2 + εi,

and the corresponding R output is given below (in attached file). Assume α = 0.05 for all tests.

a) Calculate the sample variance for Y (i.e. SY2).

b) What is the value of R2?

c) Consider the following three sets of hypotheses: H0 : β3 = 0 versus H1 : β3 ≠ 0, H0 : β4 = 0 versus H1 : β4 ≠ 0, and H0 : β5 = 0 versus H1 : β5 ≠ 0. Give the test statistic values and corresponding p-values (in R, use the pt function) for each test. What are your conclusions for each test?

d) Consider testing H0 : β3 = β4 = β5 = 0 versus H1 : β3 ≠ 0, β4 ≠ 0, and/or β5 ≠ 0. The appropriate reduced model was fit and the corresponding R output is given below. Use this output to calculate the (F) test statistic value and corresponding p-value (in R, use the pf function) and state your conclusion?

e) What is the fundamental difference between parts c and d? Based on parts c and d, do you think that the "curvature" terms (i.e. X1 ? X2, X12, and X22) are important in explaining brand preference?

Q2. 93 cars were selected at random from among passenger car models. The data (which can be found on E-Learning under Content - Data Sets - cars) includes the following variables: HP: Horsepower (maximum), C: Number of cylinders, E: Engine size (liters) and W: Weight (pounds). For this problem, you are to treat HP as the dependent (or response) variable and C, E, and W as the independent (or predictor) variables. Furthermore, assume α = 0.05 for all tests.

a) Fit a multiple linear regression model relating horsepower (HP) to number of cylinders (C), engine size (E), and weight (W). What is the F-statistic and corresponding p-value for the test of "regression significance"?

b) What are the values of R2 and Radj2. Interpret the value of R2 for this problem.

c) Obtain a 95% confidence interval for the slope parameter pertaining to weight (W). You can use the user-defined betaci function.

d) Obtain a 95% confidence interval for the mean horsepower response when C = 3, E = 2.5, and W = 3000.

e) Obtain a 95% prediction interval for a new observation of horsepower when C = 2, E = 3.0, and W = 2800.

Q3. The file (the data can be found on E-Learning under Content - Data Sets - airpass) airpass.txt contains a single column of data. This data represents a so-called time series that contains the number of international airline passengers (in thousands) for each month from January, 1949 through December, 1960. Thus, n = 144 observations. In what follows, you are to treat the logarithm of this data as the dependent (or response) variable (i.e. in R, use Y = log(airpass) to create the variable).

a) Let denote the independent variable "time". That is, takes on the values 1, 2, 3, ..., 144 corresponding to the first, second, third, ..., and last values of , respectively (In R, use x1 = 1:144). Obtain a time series plot of the data using the following command: plot(x1,y,type="l"). Comment on any trends and/or periodic features you see in this plot.

b) Let X2 = cos(2πX1/12) and X3 = sin(2πX1/12) be two additional independent variables (e.g. In R, use x2 = cos((2*pi*x1)/12) and x3 = sin((2*pi*x1)/12)) and fit the following model to the data: Yi = β0 + β1X1i + β2X2i + β3X3i + εi. What is the value of R2 for this model?

c) With the help of the predict command, obtain the model (given in (b)) predictions when X1 = 1.00, 1.01, 1.02, 1.03, ..., 143.98, 143.99, 144.00 and store these predictions in a vector called "Z". What are the model predictions when X1 = 1.00 and X1 = 144.00, respectively?

d) Referring back to parts (a) and (c), add a red prediction line to the plot in part (a) with a command similar to the following: lines(seq(1,144,.01),Z,col=2). As part of the output to this question, include this plot with your solution. By visual inspection of this plot, how well do you think the model did in capturing the unique features of this data?

e) Consider testing H0 : β2 = β3 = 0 versus H1 : β2 ≠ 0 and/or β3 ≠ 0? Obtain the corresponding F-statistic and p-value for this test using the user-defined betatest function. What is your conclusion? Also, how does this conclusion relate to your interpretation of the plot given in part (d)?

f) Back in December, 1960, what would have been a 95% prediction interval for the log of the number of international airline passengers for January, 1961?

4. A random sample of 32 births were collected by researchers to investigate if smoking mothers have babies with lower birth weight? The researchers collected data on birth weight (Weight), length of gestation (Gest), and smoking status of the mother (smoker or non-smoker). The data can be found on E-Learning under Content - Data Sets - birthweight. In what follows, let Y = Weight, X1 be a 0/1 indicator variable (i.e. 0 = non-smoker, 1 = smoker) that represents the independent variable "smoker", and X2 = Gest. Assume α = 0.05 for all hypothesis testing questions.

a) Fit the following SLR model: Yi = β10 + β11X1i + εi and answer the following questions.

i. What is the estimate of and what is its interpretation in terms of this problem?

ii. What are the test statistic and p-value for testing H0 : β11 = 0 versus H1 : β11 ≠ 0? Based on the corresponding p-value for this model, does being a "smoker" significantly affect the birth weight of babies?

iii. What are the values of R2, Radj2, and σ^2 for this model?

b) Next, fit the following MLR model: Yi = β20 + β21X1i + β22X2i + εi and answer the following questions.

i. What is the estimate β21 of and what is its interpretation in terms of this problem?

ii. What are the test statistic and p-value for testing H0 : β21 = 0 versus H1 : β21 ≠ 0? Based on the corresponding p-value for this model, does being a "smoker" significantly affect the birth weight of babies?

iii. What are the values of R2, Radj2, and σ^2 for this model?

c) Are your overall conclusions for parts (a) and (b) the same? If so, why? If not, why? Hint: Consider the values of σ^2 for each model.

Note - You can ignore part e for question 3. The data files have been attached.

Attachment:- Assignment Files.rar

Reference no: EM132263569

Questions Cloud

Estimate the likelihood and cost of each risk : The cost and time associated with the risk, and the overall impact to the organization are some of the factors that must be considered as well.
Create a spreadsheet for three of the top realtors : Ms. Chavez has asked you to create a spreadsheet for three of the top Realtors for the past month.
Find the probability of drawing an ace and a nine : Find the probability of drawing an ace and a nine in either order. (Enter your answer as a fraction.) anyone help me with the steps?
Reposition a current brand or industry in the marketplace : reposition a current brand or industry in the marketplace. The brand and industry selected are ones which illustrate the need for repositioning
Calculate the sample variance for Y : STAT 5680 Homework - Calculate the sample variance for Y (i.e. SY2). What is the fundamental difference between parts c and d
Mean of 180 and a standard deviation : A distribution of values is normal with a mean of 180 and a standard deviation of 24. From this distribution, you are drawing samples of size 30.
How a specific administrative task helped in securing : Identify the proper phase in the PDCA cycle for each security administration task discussed in the lab.
Find the area under the standard normal curve : Find the area under the standard normal curve between z = 0.04 and z = 2.64. Round your answer to four decimal places, if necessary.
Consider a t distribution with 3 degrees of freedom : Consider a t distribution with 3 degrees of freedom. Find the value of c such that P(t>c)=0.01. Round your answer to at least three decimal places

Reviews

len2263569

3/22/2019 3:49:24 AM

Regression analysis homework to be completed in R. Specifically, questions 3 and 4 are where I am struggling. You can ignore part e for question 3. The datafiles have been uploaded along with the PDF of the homework. airpass= problem 3 birthweight= problem 4.

Write a Review

Applied Statistics Questions & Answers

  Find the z scores for which the distributions area lies wit

Find the z-scores for which 88% of the distributions area lies within -z and z. The z-scores are ?

  State the null and alternative hypotheses

BUS105e Statistics Assignment - Group-based Assignment. Select an appropriate test of hypothesis to determine if the mean on-time arrival rate is different between the two lines. State the null and alternative hypotheses and explain how you develop t..

  What is the null hypothesis

Is the mean Hemoglobin level of a group of high-altitude workers different from 16g/cc and does the mean speed of 50 cars as checked by radar on a I-70 differ from 65mph?

  Samples means and also compute the population mean

Assuming that you sample without replacement, select all possible samples of n = 2 and construct the sampling distribution of the mean. Compare the mean of all the samples means and also compute the population mean. Are they equal? What is the proper..

  The sampling distribution of the sample mean.

Find the standard deviation of the sampling distribution of the sample mean.

  Study on cardiovascular disease risk factor in islanders

Topic - A study on cardiovascular disease risk factor in islanders. Collect the data you will use to answer your research question

  Draw a pie chart for your data

Find a categorical variable for which there are at least three categories and for which you can collect at least 20 observations.

  Examine the graphs of data in the accompanying excel file

In 2009, the New York Yankees won 103 baseball games during the regular season. Examine the graphs of the data in the accompanying Excel file on the sheet labeled "Regression."  Provide a short assessment of the message that the graphs impart

  Write code to perform k-means clustering on the values

Write code to perform k-means clustering on the values in the matrix 'dat'. The true group labels are provided in the vector 'true_groups'. Of course, you can't use that until the very end where you will perform some verification

  Determine the sample size needed to make us confident

Determine the sample size needed to make us 95 percent confident that x, the sample mean alert time, is within a margin of error of .3 second of m, the population mean alert time using the new display panel.

  An article that deals with sleep and shift-workers

What was the total number of participants surveyed for the study and what was the total number of participants who completed the entire survey?

  Simple random sampling

Simple random sampling

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd