Reference no: EM132277417
Question 1 - The Negative Binomial distribution. NegBin(r, θ), describes the distribution of the number of failures (denoted by Y) before the r-th success in an experiment that consists of a sequence of independent and identically distributed Bernoulli trials, where each trial has a probability 0 of success. We collect a sample of n independent observations Yi, i = 1, . . . ,n, with Yi|θ ∼ NegBin(r, θ), where θ is unknown and the value of r is known.
(a) Show that the Negative Binomial distribution of Yi|θ belongs to the one-parameter exponential family.
(b) Using the results from part (a), derive the conjugate prior for θ and show that it can be written as a Beta distribution.
(c) Derive the Jeffreys prior for θ and show that it can be written as a Beta distribution.
(d) Suppose that the Beta prior from part (c) was considered reasonable for θ. Derive the posterior distribution p(θ|y) for θ, where y = (y1, . . . , yn) denotes the values of the it observations. Show that the posterior distribution can be written as a Beta distribution.
(e) Suppose that we want to predict the value of a new observation (denoted by y-). Write down the predictive distribution p(y∼ly).
Question 2 - A scientist is interested in the proportion of female horseshoe crabs that have at least one male crab (called satellites) residing nearby. The scientist investigates whether the proportion is affected by two factors: the female crab's colour (denoted by L with L = 1 if the colour is light and L = 0 otherwise) and the female crab's width (denoted by W with W = 1 if the width is larger than 25cm and W = 0 otherwise). The data collected by the scientist are arranged in the 12 cells of the table below, according to a 2-by-2 factorial layout, where yi and ni are the number of female crabs which have satellites and the total number of female crabs, respectively, in the ith cell, i = 1, . . . , 12, from top to bottom and then left to right.
W = 0
|
W = 1
|
L = 0
|
L = 1
|
L = 0
|
L = 1
|
i
|
yi
|
ni
|
i
|
yi
|
ni
|
i
|
yi
|
ni
|
i
|
yi
|
ni
|
1
|
3
|
10
|
4
|
20
|
42
|
7
|
4
|
11
|
10
|
86
|
109
|
2
|
11
|
39
|
5
|
35
|
69
|
8
|
9
|
28
|
11
|
152
|
196
|
3
|
8
|
31
|
6
|
31
|
65
|
9
|
11
|
32
|
12
|
133
|
158
|
The scientist is interested in modelling and learning about the probability of having satellites (denoted by θi for the ith cell) in each cell. Your task is to design an appropriate random effects logistic regression model for the scientist. The model should use the colour L, width W and their interaction as explanatory variables, and also allow for over-dispersion.
(a) Write down all the probabilistic distributions for the model that you have designed.
(b) Suppose that, using a Gibbs sampler, you have obtained a sample (denotes by (θ6(M+1), . . . , θ6(N))) from the posterior distribution for θ6. Explain how to use this sample to estimate the 5% highest posterior density (HPD) interval of θ6.
(c) Explain how you could use your model and a Gibbs sampler to obtain samples from the predictive distribution of Y∼, the number of female crabs having satellites out of totally n∼ = 100 female crabs in a new cell with the colour L = 1 and the width W = 1.