Reference no: EM132332201
Part 1: Law of Large Number
The first part of the empirical analysis serve the purpose to get you familiarize the LLN. LLN characterize the following fact:
Suppose {Xn} is a sequence of random variables draw from an underlying population space, then the sample mean of {Xn} defined as X-n = 1/n i=1∑n Xi, converge in Probability or Almost Surely to the population mean of that underlying population space.
(a) Randomly draw 10000000 samples from N(4, 16), plot the histogram of the samples and underlying population distribution together. What is the theoretical expectation for the population space? Variance? And Standard Deviation?
Let sample size n equal to 1, 2, 4, 8, 16, ··· , 222 = 4194304, 223 = 8388608 respectively. Randomly choose n samples from the original 10000000 sample space. (This is for speed consideration, you could draw n samples from N(4, 16) directly each time, but that would cost a lot of time), compute the sample mean X-n for each n.
Plot the sample mean with each sample size together with the population mean. What is your conclusion from the graph.
Change the parameters of the underlying distribution N(4, 16) to whatever you like, follow the same procedure again. Can you draw the same conclusion?
(b) Do the same thing as question (1) asked for Binomial Distribution B(50, 0.4) as the underlying population distribution.
(c) Do the same thing as question (1) asked for t-Distribution t(10) as the underlying population distribution. When you change your own parameters as the last question asked, make sure the degree of freedom be greater than 2, otherwise the sample mean may not converge, think about why?
(d) Do the same thing as question (1) asked for F-Distribution F(9,7) as the underlying population distribution. When you change your own parameters as the last question asked, make sure the degree of freedom df2 be greater than 4, otherwise the sample mean may not converge, think about why?
Part 2: Central Limit Theorem
The second part of the empirical analysis serve the purpose to get you familiarize the CLT. CLT characterize the following fact:
Suppose {Xn} is a sequence of random variables draw from an underlying population space, then the sample mean of {Xn} defined as X-n = 1/n i=1∑nXi, converge in distribution to Normal Distribution N(µ, σ2), where µ = E[Xi], σ2 = Var[Xi]/n.
(a) Randomly draw 10000000 samples from N(20,25), plot the histogram of the samples and underlying population distribution together.
What is the theoretical expectation for the population space? Variance? And Standard Deviation?
Let sample size n equal to 10 and 100000 respectively. We want to conduct the following experiment on both large sample cases (n = 100000) and small sample cases (n = 10).
Repeat the following process for 2000 times: Randomly choose n samples from the original 10000000 sample space. (This is for speed consideration, you could draw n samples from N(20,25) directly each time, but that would cost a lot of time), compute the sample mean X-n for each n.
As a result, we will have 2000 X-10 and 2000 X-100000, write down the first and last 10 of them respectively.
Plot the histogram of 2000 sample mean X-10 together with N(20, 25/10 ). What is your conclusion from the graph.
Plot the histogram of 2000 sample mean X-100000 together with N(20, 25/100000 ). What is your conclusion from the graph.
Plot the histogram of 2000 normalized sample mean X-10-20/√(25/10) together with N(0, 1). What is your conclusion from the graph?
Plot the histogram of 2000 normalized sample mean X-100000-20/√(25/100000) together with N(0, 1). What is your conclusion from the graph?
Change the parameters of the underlying distribution N(20, 25) to whatever you like, follow the same procedure again. Can you draw the same conclusion?
(b) Do the same thing as question (a) asked if Binomial Distribution B(40, 0.2) is the underlying population distribution.
(c) Do the same thing as question (a) asked if t-Distribution t(10) is the underlying population distribution. When you change your own parameters as the last question asked, make sure the degree of freedom be greater than 2, otherwise the sample mean may not converge in Normal Distribution, think about why?
(d) Do the same thing as question (a) asked if F-Distribution F(8,6) is the underlying population distribution. When you change your own parameters as the last question asked, make sure the degree of freedom df2 be greater than 4, otherwise the sample mean may not converge in Normal Distribution, think about why?
Part 3: Extra Credit
Find a new distribution, draw samples from that distribution and follow the same procedure in the previous two parts. See if LLN and CLT still hold.
Requirement: Please submit your Rcode and Final Report to sakai. Final Report must be converted into PDF format. You are encouraged to collaborate with each other but you have to write your own code and report.
Attachment:- Assignment Files - Central Limit Theorem.rar