Assignment - Practice Problems

Problem 1: Standardizing and Transforming

Part (a) The variable contains a sample of values drawn from a normal distribution with known expected value E[X] = 57.3 and variance Var[X] = 495.2. Perform a standardization transformation to generate a new vector consisting of values that are distributed as a standard normal distribution. Calculate the sample mean and the sample variance, and report both values using separate cat() statements, rounding to 5 decimal places.

Part (b) Create a histogram of the standarized values you created in part (a). Then superimpose a standard normal density curve over this graph. (Note: the sample is much smaller than what we usually generate in our simulations, so this graph will be much noisier.)

Part (c) The vector contains a sample of values from a standard normal random distribution. Perform a general normal transformation to generate a new vector consisting of values from a general normal distribution with expected value E[X] = -143.2 and variance Var[X] = 791.6. Calculate the sample mean and the sample variance, and report both values using separate cat() statements, rounding to 5 decimal places.

Part (d) Create a histogram of the transformed values you created in part (c). Then superimpose a density curve for the general normal distribution in part (c) over this graph. (Note: the sample is much smaller than what we usually generate in our simulations, so this graph will be much noisier.)

Problem 2: Method of Moments for a Weibull Distribution

A Weibull distribution with scale parameter θ and shape parameter τ has the density function:

f(x) = τ · (x/θ)τ · exp{-(x/θ)τ}/x, x > 0

Then the expected value of X is:

E[X] = θ · Γ(1 + 1/τ)

Using the first moment, the method-of-moments estimator is:

θ^ = X-/Γ(1 + 1/τ)

In the special case where τ has the known value τ = 2, the density reduces to:

f(x) = 2 · (x/θ)2 · exp{-(x/θ)2}/x, x > 0

Also, the expected value is:

E[X] = θ · Γ(1 + 1/2)

Then the method of moments estimator is:

θ^ = X-/Γ(1 + 1/2)

Part (a) The variable contains data from a Weibull distribution with known shape parameter τ = 2 and unknown scale parameter θ. Calculate the method-of-moments estimator for the scale parameter θ. Use the built-in R function gamma() to calculate the value of the gamma function in the denominator. Report your result using a cat() statement, rounding to 5 decimal places.

Part (b) Construct a histogram of the data in Then superimpose the density curve for a Weibull distribution using the built-in R function dweibull(), using your parameter estimate from part (a) as the scale parameter θ and a known shape parameter τ = 2.

Problem 3: Mean Squared Error

Obi is measuring an enzyme level in his laboratory. The true value (unknown to him) is µ = 100. He has three measuring devices:

The first measuring device, denoted W, produces measurements that have an expected value of E[W] = 90 and a variance of Var[W] = 100.

The second measuring device, denoted X, produces measurements that have an expected value of E[X] = 100 and a variance of Var[X] = 500.

The third measuring device, denoted Y, produces measurements that have an expected value E[Y] = 95 and a variance of Var[Y] = 350.

For each device, all measurements are independent of one another.

Part (a) Obi decides to take one measurement and then uses this to estimate the enzyme level.

What is the mean squared error (MSE) when he uses 1 measurement from W?

What is the MSE when he uses 1 measurement from X?

What is the MSE when he uses 1 measurement from Y?

Which estimator gives the best estimate of the true enzyme level?

Part (b) Next, Obi decides to takes 5 independent measurements and then use their average as the estimate of the true enzyme level.

What is the MSE when he averages 5 measurements from W?

What is the MSE when he averages 5 measurements from X?

What is the MSE when he averages 5 measurements from Y?

Which estimator gives the best estimate of the true enzyme level?

Part (c) Finally, Obi decides to takes 10 independent measurements and then use their average as the estimate of the true enzyme level.

What is the MSE when he averages 10 measurements from W?

What is the MSE when he averages 10 measurements from X?

What is the MSE when he averages 10 measurements from Y?

Which estimator gives the best estimate of the true enzyme level?

Problem 4: Sampling Distribution of Sample Minimum

So far in MATH E-156, we've mainly been focused on the sampling distribution of the sample mean, although we've explored the sample median for normal distributions and the sample maximum for uniform distributions. Now let's investigate a remarkable result for exponential distributions:

Suppose X1, X2, . . . , Xn are independent random variables that are all exponentially distributed with rate parameter λ. Then the sample minimum is also exponentially distributed, with rate parameter nλ.

Part (a) Let's work through a simple example. Suppose we draw samples of size n = 8 from an exponential distribution with rate parameter λ = 1.5, and we calculate the sample minimum of this sample.

What is the distribution of this sample minimum? Give the name of the distribution, along with the numerical value of any parameters.

What is the expected value of this sample minimum random variable?

What is the variance of this sample minimum random variable?

Report the distribution of the sample with one or two sentences. Report the expected value and variance of the sample minimum using separate cat() statements for each value, rounding to 5 decimal places.

Part (b) Construct a simulation that generates random sample minimums from an exponential distribution with rate parameter λ = 1.5, for samples of size n = 8:

For each iteration of your for loop, draw a sample of size n = 8 from an exponential distribution with rate parameter λ = 1.5.

Calculate the sample minimum of this sample, and then store this value in an outcome vector.

When the simulation is done, your outcome.vector will be populated with random sample minimums. Then report the sample mean and sample variance of the outcome.vector using a separate cat() statement for each, rounding to 5 decimal places.

Part (c) Construct a histogram of the random values you generated in part (b). Then superimpose a density curve using the distribution you specified in part (a).

Part (d) The vector contains data from an exponential distribution with a rate parameter of λ. Use a method-of-moments estimator to estimate the rate parameter of the distribution of the sample minimums. Report your result using a cat() statement, rounding to 5 decimal places.

Part (e) As a check on your work in part (c), construct a histogram of the values in Then superimpose the density curve for the distribution that you estimated in part (d).

Part (f) In part (d), you estimated the rate parameter λ for the data in the variable In fact, I generated this data by first drawing random samples of size n = 8 from an exponential distribution with rate parameter θ, and then calculating the sample minimum. Use your estimate from part (d) to estimate the value of the rate parameter θ. (Hint: this is very easy, and requires one line of code, if that; don't overthink this.)

Problem 5: Constructing a One-Sample Test

Marie is a field biologist who is studying armadillos, and she is wondering if the local armadillo population has on average a different weight than normal, although she doesn't know if it's higher or lower. She knows that armadillo weights are normally distributed and also that the variance of armadillo weights is Var[X] = 10000, but she's not sure about the expected value. The standard weight of armadillos is µ = 4500 grams, and Marie would like strong evidence before she rules out this standard value. She draws a sample of size n = 27 from the population, and then using the sample mean as her test statistic she performs a two-sided test of the null hypothesis. She calibrates the test so that the probability of a Type I error rate is 5%.

Part (a) What is the null hypothesis for this test? State the null hypothesis in a sentence. Then define a variable to store the expected value of the distribution, given that this null hypothesis is true.

Part (b) What is the significance level of the hypothesis test? Define a variable to store this signficance level, and report your result using a cat() statement, rounding to 5 decimal places.

Part (c) What is the variance of the test statistic? Report your result using a cat() statement, rounding to 5 decimal places.

Part (d) What is the lower critical value for this test? Report your result using a cat() statement, rounding to 5 decimal places.

Part (e) What is the upper critical value for this test? Report your result using a cat() statement, rounding to 5 decimal places.

Part (f) Construct a simulation to show that the upper and lower critical values that you calculated in parts (d) and (e) have the correct tail probabilities. For each iteration of the for loop, the simulation should first draw a sample of size n = 27 from the probability distribution under the null hypothesis that you defined in part (a), then calculate the sample mean, and finally store this in an outcome.vector. When the simulation has finished, the outcome.vector will consist of random sample means of samples of size n = 27. Perform vectorized operations on this outcome.vector to show that the proportion of values that are less than the lower critical value is correct, given the significance level that you defined in part (b). Finally, do the same for the upper critical value. Report each result separately using a cat() statement, rounding to 5 decimal places.

Part (g) Draw a graph of the sampling distribution of the test statistic under the null hypothesis. Include vertical lines indicating the lower and upper critical values, and shade the graph under the rejection region.

Problem 6: Conducting the One-Sample Hypothesis Test

Part (a) Use the data in the variable to calculate the observed value of the test statistic.

Part (b) Based on the observed value of the test statistic that you calculated in part (e), do you think that this data constitutes strong evidence against the null hypothesis? Explain your answer with one or two sentences.

Part (c) Now we'll construct a 90% confidence interval for the population expected value. For this part, calculate the lower endpoint of this confidence interval, given the information in problem 5 and the observed value of the test statistic. Report your result using a cat() statement.

Part (d) We continue with our construction of a 90% confidence interval for the population expected value. For this part, calculate the upper endpoint of this confidence interval, given the information in problem 5 and the observed value of the test statistic. Report your result using a cat() statement.

Part (e) Using the confidence interval that you calculated in parts (c) and (d), perform a test of the null hypothesis you defined in Problem 7, part (a). Report your conclusion and explain your reasoning using a few sentences.

Part (f) Using the distribution of the null hypothesis that you defined in Problem 7, along with the observed value of the test statistic from part (a), calculate the p-value for this data. Report your result using a cat() statement.

Part (g) Using your result from part (f), perform a test of the null hypothesis. Report your conclusion using one or two sentences.

Part (h) At the beginning of problem 5, the problem statement indicated that Marie constructed her test so that the probability of a Type I error is 5%. Suppose Marie decides that's insufficiently stringent, and instead wants to conduct her test with a Type I error rate of 1%. What would her conclusion be now? You can answer this question with just a few sentences; do not perform any further R calculations. (Hint: think about part (f).)

Problem 7: Constructing a Two-Sample Test

Tyrone is conducting an experiment to compare a new agricultural fertilizers with the standard fertilizer. First, nX = 100 plants are treated with the standard fertilizer. Then another set of nY = 100 plants are treated with the new fertilizer.

For the plants treated with the standard fertilizer, the crop yields are normally distributed, with an unknown expected value of µX and a known variance of σX = 1500.

For the plants treated with the new fertilizer, the crop yields are normally distributed, with an unknown expected value µY and a known variance of σY = 1750.

All of the crop yields are independent of one another.

Tyrone wants to show that there is a difference in the crop yields between the two fertilizers. Therefore, he wants to falsify the hypothesis that the two unknown expected values µX and µY are equal.

Part (a) Let ? = µY - µX denote the difference in the expected values of the crop yields between the two fertilizers. Tyrone wants to perform a two-sided test to demonstrate that there is a non-zero difference in the true expected values of the crop yields. What should he use as the null hypothesis for this test?

Part (b) To test the null hypothesis in part (a), Tyrone decides to use the observed difference of the sample means D = Y- - X- as the test statistic. What is the expected value of this test statistic under the null hypothesis? Explain your answer with one or two sentences.

Part (c) What is the variance of the test statistic D = Y- - X-? Report your result using a cat() statement, rounding to 5 decimal places.

Part (d) The experimenters want to design their experiment so that it has a significance level of α = 0.10. Using your answers from parts (b) and (c), determine the upper critical value U that will insure that the two-sided test will have this appropriate Type I error rate.

Part (g) Draw a picture of the sampling distribution of the test statistic D. Draw the density curve of the distribution using a solid line, and indicate the lower and upper critical values using vertical lines with text annotation. Shade under the curve for the rejection region. Finally, be sure you include a main title, as well as axis titles.

Problem 8: Conducting the Two-Sample Test

Part (a) Calculate the sample mean of the variable Calculate the sample mean of the variable Then use these two sample means to calculate the test statistic D. Report your final result using a cat() statement.

Part (b) Does the observed value of the test statistic in part (a) constitute strong evidence against the null hypothesis, given the pre-specified significance level? Explain your answer with one or two sentences.

Part (c) Calculate a two-sided 90% confidence interval for the true difference ?. Report the lower and upper endpoints of this confidence interval using separate cat() statements, rounding to 5 decimal places.

Part (d) Using the confidence interval you calculated in part (c), perform a test of the null hypothesis of no difference in expected crop yields between the two fertilizers. Report your result with one or two sentences.

Part (e) Calculate the two-sided p-value for this observed data. Report your result using a cat() statement, rounding to 5 decimal places.

Part (f) Using your result from part (e), perform a two-sided hypothesis test at the α = 0.05 level. Report your conclusion with one or two sentences.

