Construct and plot a confidence ellipse

Assignment Help Advanced Statistics
Reference no: EM131234845

1. The vector of random variables (X1, X2, X3)T follows a trivariate normal distribution with mean and covariance matrix given by


1
3 1 -2
μ =    -2 Σ = 1 2 -1

0
-2 -1 1.5

(a) Find the joint distribution of (X1, X3).
(b) Find the joint conditional distribution of (X1, X3)|X2 = 1.

(c) Find the joint distribution of

2X1 - X3
X2 + 4X3 + 1

2. Let X ~ Np(µ, Σ). Show via moment generating function that the quadratic shown below is distributed as a central Chi-Square distribution with degrees of freedom p.

(X - µ)T Σ-1 (X - µ) ∼ χp2

Recall that the moment generating function of a Chi-Square distribution with degrees of freedom p is given by M (t) = (1 - 2t)-p/2. A helpful property here is that for generic independent random variables Y1, ..., Yn: MY1+...+Yn (t) = E(etΣt=1n Yi) = Πni=1 E(etYi)

3. Consider the regression problem Y|X = Xβ + R, in which R ~ N (0 , σ2I), X is an n × p matrix, βp×1 is the parameter vector, and Yn×1 is the vector of response variable.

Show that

(a) β^M LE = (XT X)-1XTY

(b) σ^2MLE = (Y-Xβ^MLE )T(Y - Xβ^MLB)

4. We often mention that n (sample size) must be much larger than p (the dimension of each observation) in order for the Central Limit Theorem to be an accurate approximation particularly when the data do not come from a normal distribution.

Recall for the univarite t-distribution, the smaller the degrees of freedom, the larger the kurtosis. Similarly, in the multivariate case, the lower the degrees of freedom, the further the distribution deviates from normality (particularly via kurtosis). The following code simulates data from a p-variate t distribution with degrees of freedom 6, and a covariance matrix that was simulated from a Wishart with p degrees of freedom:

Σ ∼ Wishart(p, Ip)

X1, ...Xn ii~d tp(Σ, df = 6)

Use the code below to input atleast three values of p that contain one low, medium, and high value (e.g. 2, 5, 20), and assess the normality of the sample means for each values of p using n = (10, 100, 1000). Report the qqplots and formal test results for the normality of the sample means. Feel free to test more p's and n's, but you do not need to show qqplots and normality tests for extra results. Provide a written summary of your findings.

library (mvt)

p = p0
N = 5000
means = matrix ( 0 , ncol = ( p ) , nrow = N)

Sigma <- matrix ( rWishart ( 1 , df = p , Sigma = diag ( p ) ), byrow = TRUE, ncol = p)

## Keep the same Sigma for fixed p and varying n

n = n0

for ( i in 1 :N) {

x <- rmvt ( n , sigma = Sigma , df = 6 )

means [ i , ] = apply ( x , 2 , mean)

}

5. Stiffness and bending strength are two variables of interest in the quality of lumber. A sample of 30 pieces of a particular type of wood is provided in the file lumber.txt.

(a) Construct and plot a 95% confidence ellipse for the pair µ = (µ1, µ2), where µ1 = E(Stiffness) and µ2 = E(Bending Strength).

(b) Suppose high quality lumber has µ = (2000, 10000)T . Given the result in part (a), do the data in lumber.txt represent a sample of high quality lumber? Explain.

(c) Given the data, do you think bivariate normal distribution is a good model for the data? Use a QQ-plot, as well as a formal test, to answer this question.

6. Consider the random vector X where

X ~ N3

3
10 5 4
2 , 5 18 7
1
4 7 9

Below, you see 5 simulated samples from this distribution.

6:171516  4:605047  5:8303953
7:595643  1:754275  1:8826819
4:047683   1:791576  0:7613451
1:672295   3:434457  2:1768536
2:904052 3:906055 4:6161726

Of course, the choice of data is arbitrary. Here is how I generated the 5 observations above. Feel free to generate more observations, change the mean, covariance, etc.

library(mvtnorm) mu <- c(3,-2,1)
Sigma <- matrix(c(10,5,4,5,18,7,4,7,9),nrow=3) X <- rmvnorm (5,mu,Sigma)

Now, suppose two of the observations in the data-set above are missing at random, the one on the fist row and first column, as well as the one on the third row and third column. The data-set with the missing components is shown below.

NA 4:605047 5:8303953
7:595643 1:754275 1:8826819
4:047683 1:791576 NA
 1:672295 3:434457 2:1768536
2:904052 3:906055 4:6161726

Use EM algorithm described in your text book to estimate the missing data, the MLE for the mean vector and the MLE for the covariance matrix. Be sure that you run the algorithm long enough to reach convergence say within 1e - 5. Also, consider the algorithm in which we only update the missing xj˜(1) for each subject/observation j = 1, ..., n and then recompute the MLE's directly from the updated dataset. In other words, we skip (5-39), and update Σ˜ from the entire dataset as opposed to trying to separately estimate each x(˜1) (1)T (note that the estimate for x(˜1) (2)T ∼j xj ∼j xj are the same under both algorithms). Discuss your thoughts on the implications of both EM methods. Do you prefer one over the other? Discuss any theoretical benefits/downfalls that you see.

7. Bootstrap is an efficient method in calculating the p-value of a test when the theoretical distribution of the test statistic is not available, and/or if the sample size is too small for the asymptotic approximations. The data file T est.txt includes 30 observations of 3 variables. Interest lies in testing the null hypothesis


4

4
H0:μ =    8 vs. Ha:μ ≠ 8

-2

-2

To calculate the bootstrap p-value, generate 10,000 samples, each of size 30 (with re- placement), from the original sample. For each sample set compute the test statistic:

W = - 2 log((maxΣ∈?0, L(µ0,Σ))/(maxµ,Σ∈?L(µ~, Σ)).

Let Wobs be the above computation for the originally observed dataset. Estimate the p-value Pr(W > Wobs) using the bootstrap samples. Compare your answer to the p-value calculated from the asymptotic distribution of the test statistic (Result 5.2 in the book). Provide a plot of your choice to compare the asymptotic distribution of W to its empirical distribution estimated based on bootstrap samples.

Attachment:- Assignment.rar

Reference no: EM131234845

Questions Cloud

Indifference curves are smooth and convex : David purchases two goods: bananas (x) and tea (y). Her indifference curves are smooth and convex. Suppose the price of tea decreases. On a graph, illustrate the income and substitution effects of the price change on Diana's optimal consumption bu..
Pricing based on break-even analysis-pricing strategies : This week you are reviewing setting pricing based on break-even analysis, pricing strategies, and credit policies as they affect your small business and the people with whom you do business. You reviewed all of these concepts in both your Reading and..
Find the area required for a shell-and-tube heat exchanger : calculate the area required for a shell-and-tube heat exchanger with the steam making one shell pass and the water making two tube passes. The overall heat-transfer coefficient is 3000 W/m2 ·K.
Analyze the various research methods employed : Analyze the various research methods employed in psychology to determine which research method seems the most applicable across the greatest number of situations. Explain your rationale (including the single greatest strength and weakness of the m..
Construct and plot a confidence ellipse : Construct and plot a 95% confidence ellipse for the pair and Given the data, do you think bivariate normal distribution is a good model for the data? Use a QQ-plot, as well as a formal test, to answer this question.
Efforts to produce good grades change : Suppose your teacher announces that only 1 student in the class will get a good grade. How will your efforts to produce good grades change when you are a monopoly (the only student in class) and when you are in a monopolistically competitive marke..
What performance appraisal system : Do you think that the experts recommendations will be sufficient to get most of the administrators to fill out the rating forms properly? Why? Why not? What additional actions (if any) do you think will be necessary? What performance appraisal system..
Calculate the net present value for each plane model : Calculate the weighted average cost of capital (WACC) for the firm's existing capital structure. Calculate the net present value (NPV) for each plane model using the company's WACC as the hurdle rate.
Evaluate business opportunity using a feasibility analysis : Evaluate your business opportunity using a feasibility analysis. Next, determine the main components of the feasibility analysis that are easy to execute and the main components that can possibly present challenges. Support your response

Reviews

inf1234845

10/10/2016 8:04:52 AM

There are statistical concepts which is not used frequently like the one mentioned in this task. It requires time and study materials. The trivariate normal distribution concept mentioned in this task is a very rare concept and therefore I need your help to refer to study materials to work on the same.

len1234845

10/8/2016 1:38:30 AM

I need both the solutions(typed or handwritten) and R codes(R file).Provide a plot of your choice to compare the asymptotic distribution of W to its empirical distribution estimated based on bootstrap samples.

Write a Review

Advanced Statistics Questions & Answers

  Relationship between speed, flow and geometry

Write a project proposal on relationship between speed, flow and geometry on single carriageway roads.

  Logistic regression model

Compute the log-odds ratio for each group in Logistic regression model.

  Logistic regression

Foundations of Logistic Regression

  Probability and statistics

The tubes produced by a machine are defective. If six tubes are inspected at random , determine the probability that.

  Solve the linear model

o This is a linear model. If your model needs a different engine, then you need to rethink your approach to the model. Remember, there are no IF, Max, or MIN statements in linear models.

  Plan the analysis

Plan the analysis

  Quantitative analysis

State the hypotheses that you are going to test.

  Modelise as a markov chain

modelise as a markov chain

  Correlation and regression

What are the degrees of freedom for regression

  Construct a frequency distribution for payment method

Construct a frequency distribution for Payment method

  Perform simple linear regression

Perform simple linear regression

  Quality control analysis

Determining the root causes

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd