Reference no: EM1380385
The mock data file consists of 3 columns, each containing 1000 numbers:
1. a flag indicating which data row
2. the sampled x value (1 - 1000)
3. the corresponding sampled y value (1 - 1000)
The challenge is essentially to determine the linear relationship between x and y using these 1000 data pairs. It divides up into three steps, of increasing complexity.
Step 1: Use ordinary least squares to fit the linear model y = a + bx to the mock data
(a) compute LS estimators of a and b,
(b) estimate the variance of the (assumed Gaussian) noise which has been added to the mock y values
(c) estimate errors on a_LS and b_LS, and their covariance
Step 2: By casting the data analysis challenge not as a least squares problem, but as a maximum likelihood problem, form an appropriate likelihood function for the mock data, which depends on the parameters (a,b).
Then, by computing the log likelihood on a rectangular grid of values of a and b (you need to think carefully about the range of a and b values you should consider, and the spacing between them), and in turn computing the value of chi-squared for each (a,b) pair on your grid, you should find the minimum value of chi-squared. You then should turn your grid of values into a rectangular array of Delta chi-squared values. Finally, using the information in the table in Section 6, you should compute and plot Bayesian credible regions for the parameters at e.g. 68.3%, 95.4%, 99.73%. (Carrying out the calculations and making a contour plot from the results is straightforward in e.g. MATLAB, although you are welcome to use any programming language you wish).
Step 3: Finally, using the Metropolis algorithm, and assuming a Gaussian likelihood function for the model parameters a and b, write an MCMC code to generate a sample from the likelihood function - thinking carefully about your choices of proposal density and prior range for a and b. Use this sample to estimate the mean values, errors and covariance of the parameters a and b from their sampled marginal distributions. Devise a method for estimating and plotting Bayesian credible regions for the paramters, using your MCMC sample.
while Steps 2 and 3 both involve more sophisticated methods and will require you to write some simple computer code (e.g. in MATLAB)