Reference no: EM132856841
Roadmap - Sampling Distributions and the Central Limit Theorem.
Learning Objective 1: Understand the importance of sampling and how results from samples can be used to provide estimates of population characteristics such as the population mean or the population standard deviation.
Learning Objective 2: Understand the concept of a sampling distribution.
Learning Objective 3:Understand the Central Limit Theorem and the important role it plays in statistics.
Learning Objective 4: Know the characteristics of the sampling distribution of the sample mean y (e.g. is it normal? what is its mean? what is its standard deviation or standard error).
Learning Objective 5: Be able to apply these concepts to real world problems. Applications might include but are not limited to - computing a probability that y will take a certain set of values or sketching the distibution of y
Learning Objective 6:Recognise the difference between a biased estimate and an unbiased estimate.
Workshop
a. What is meant by the term sampling distribution?
b. Explain the difference between σ and σy¯
c. If we know σ , then how can we calculate the standard error of the mean, σy¯ ?
d. Explain what is meant by each of the different symbols: μ, μy¯ and y.
e. Write True of False to the following statement. If it is false, rewrite it so that it is true.
The central limit theorem tells us that if the parent population is normal, then the sampling distribution of the y is guaranteed to be normal if the size of the population is greater than 30.
f. If s = 10 and n = 4 estimate the standard error of the mean.
g. When will the standard error of the mean be equal to the standard deviation of a population?
Exercise: Sampling Distribution of the Mean
This exercise is designed to give you practice at generating a sampling distribution of the mean. You will also compute the mean and standard deviation of this distribution and thereby verify that
We are using Greek letters here, because we are declaring that we are working with a population.
So that you can do this exercise using pen-and-paper we have designed an incredibly small population. It is a little artificial, but it will help to reinforce some important concepts.
Since the population is small, we need to sample with replacement. Also a sample of A, B is different to a sample of B, A.
When computing σ of a population, you should divide by n and not n - 1, that is the formula for the population standard deviation is
σ = √ i=1 i .
However, the standard deviation of a sample is written
s = √ i=1 i .
OK, let's begin the exercise.
A population consists of the values 1, 2, 5.
a. Compute μ.
b. Compute σ using the following formula σ = √ i=1 i .
Here is a list of all of the possible samples (with replacement) of size n=2.
Sample 1: 1, 1
Sample 2: 1, 2
Sample 3: 1, 5
Sample 4: 2, 1
Sample 5: 2, 2
Sample 6: 2, 5
Sample 7: 5, 1
Sample 8: 5, 2
Sample 9: 5, 5
c. Compute the sample mean of each sample.
d. Compute the mean of the 9 sample means. How does this answer compare with your answer from part (a)?
e. Using your formula for the population standard deviation, compute the standard deviation of the nine sample means (hint: this is the same as computing the standard error of the mean). How does this answer compare to your answer from part (b)?
Exercise: Central Limit Theorem using R
Data: faithful
Variables: waiting
These data represent the waiting time between eruptions for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.
a. Produce a histogram of the waiting times. How would you describe the distribution of waiting times?
b. Randomly generate one random sample of size 30, and compute the mean of this sample.
c. As for part b, but do this 10,000 times. Then produce a histogram of the 10,000 sample means. Hopefully you will find that your histogram is normally distributed. Explain why this is so.
d. What happens if the sample size is small e.g. n = 2 How does this affect the sampling distribution of the mean.
e. Discuss the benefits of working with larger as opposed to smaller samples.
Exercise: Pen-and-Paper
Suppose that exam scores for a statistics subject are normally distributed with mean, μ, of 50 and standard deviation, σ, equal to 10.
a. Sketch the PDF of exam scores.
b. Compute the probability that a randomly selected student from this population has an exam score greater than 60.
c. Let's consider the characteristics of the sampling distribution of sample means for samples of size = 16.
What is the mean of this sampling distribution? What is the SEM?
Will the distribution of sample means be normal? Explain.
d. If we can assume that these sample means follow a normal distribution, compute the probability that the mean exam score of 16 students is greater than 60.
e. How does your answer from (d) compare to (b)? How is it different? Why is it different?
QUESTION 1
The maximum temperatures measured at the Townsville airport in 2015 from 20th May to the 31st May, sorted in ascending order and in Celsius, are 25.3, 25.8, 26.5, 27.0, 27.4, 28.5, 28.5, 28.8, 28.9, 28.9,
29.2, 30.6
a. Produce a plot of these data. Interpret.
b. Using this sample, estimate with the help of R the standard error of the mean maximum temperature for May in Townsville.
c. Using ‘pen-and-paper' show how R computed this answer.
Hint 1: write the equation for the mean and substitute the numbers into this equation to show how the mean was computed.
Hint 2: write the equation for the (sample) standard deviation s and substitute numbers into this equation to show how the standard deviation was computed.
Hint 3: combine hints 1 and 2 to show how the SEM was computed.
d. Is the SEM that you computed a valid estimate for the SEM maximum temperature for May in Townsville? Explain your reasoning.
e. Answer True or False. If the answer is False rewrite it so it is True.
The SEM tells us how much the mean y varies from population to population. Note the size of each population must be the same.
QUESTION 2
This assignment question should be completed with pen and paper and by using your Z-tables. You can use R in addition to this if you are keen, but the R part is optional, the ‘pen-and-paper' part is compulsory.
For women aged between 18-24, systolic blood pressures (in mm Hg) are normally distributed with a mean of 114.8 and a standard deviation of 13.1 (based on National Health Surveys). Hypertension is commonly defined as a systolic blood pressure above 140.
a. Sketch the PDF of systolic blood pressure for women aged between 18-24.
b. If a woman between the ages of 18-24 is randomly selected, find the probability that her systolic blood pressure will be greater than 140. (Hint: shade this area on your PDF).
c. If 4 women in that age bracket are randomly selected repeatedly, sketch the PDF of mean systolic blood pressure.
d. If 4 women in that age bracket are randomly selected, find the probability that their mean systolic blood pressure is greater than 140. (Hint: shade this area in on your PDF)
e. Given that the previous question only has a sample size of 4, explain how the Central Limit Theorem contributes to the solution.
Attachment:- Sampling Distributions and the Central Limit Theorem.rar