7089CEM Introduction to Statistical Methods for Data Science

Assignment Help Advanced Statistics

Reference no: EM133154875

7089CEM Introduction to Statistical Methods for Data Science - Coventry University

Coursework - Modelling brain signals using nonlinear regression

Learning Outcome 1: Demonstrate knowledge of underlying concepts in probability and statistics used in Data Science.

Learning Outcome 2: Select and apply appropriate statistical methods or techniques to solve problems or analyse data sets.

Learning Outcome 3: Use modern software to solve real world problems and analyse large data sets.

Learning Outcome 4: Interpret the results of their analyses and communicate those results accurately.

Task and Mark distribution:

Coursework Description:

The aim of this assignment is to select the best regression model (from a candidate set of nonlinear regression models) that can well describe the brain response to a sound signal. The ‘simulated data' were assumed collected during a neuromarketing experiment, during which a participant listens to advertisement and their brain response is recorded using magnetoencephalography (MEG). MEG is a widely used non-invasive method to record the activity of the brain. Specifically, MEG is recorded from the amygdala, a brain region involved in emotion processing. For the first 10 seconds the participant listens to an advertisement narrated by a neutral voice, during the next 10 seconds another advertisement narrated by an emotional voice is played. The regression model you are asked to identify will measure the auditory-brain interaction and the effect of the emotional narration. The researchers hypothesise that the emotional narration will evoke an increased brain response.

Data:
The ‘simulated' MEG time-series data and the sound signal are provided in two separate excel files. The first X.csv file contains the input sound signal x₁, and the categorical variable x₂ denoting which audio category is being played (i.e. x₂ = 0 when the neutral audio is played, x₂ = 1 when the emotional audio is played); and the second y.csv file contains the output MEG signal y. The file time.csv contains the sampling time of both signals in seconds. The output MEG signal is subject to additive noise (assuming independent and identically distributed ("i.i.d") Gaussian with zero-mean) with unknown variance due to distortions during recording.

Task 1: Preliminary data analysis

You should first perform an initial exploratory data analysis, by investigating:

• Time series plots (of input audio and output MEG signal).
• Distribution for each (input & output) signal.
• Correlation and scatter plots (between the audio input and output brain signal) to examine their dependencies.
• boxplots of output brain signals to examine effect of sound categories.
• You can perform the above preliminary data analysis for each type of input sound signal separately (i.e. when x2 = 0, and x2 = 1).

Task 2: Regression - modelling the brain response (MEG) to a sound signal

We would like to determine a suitable mathematical model in explaining the relationship between the input audio signal and the output brain signal and how this relationship changes based on the content of the input audio signal (i.e. neutral versus emotional), assuming such a relationship can be described by a polynomial regression model. Below are 5 candidate nonlinear polynomial regression models, and only one of them can ‘truly' describe such a relationship. The objective is to identify this ‘true' model from those candidate models following Tasks 2.1 - 2.6.

Task 2.1:

Estimate model parameters θ = {θ₁, θ₂, ? , θ_bias}^T for every candidate model using Least Squares (θ^{^} = (x^Tx)^-1x^Ty), using the provided sound input and output MEG datasets (use all the data for training).

Task 2.2:

Based on the estimated model parameters, compute the model residual (error) sum of squared errors (RSS), for every candidate model.

RSS = ∑ⁿ_i=1 (y_i - x_iθ^{^})²

Here x_i denotes the ith row (ith data sample) in the input data matrix x, θ^{^} is a column vector.

Task 2.3:

Compute the log-likelihood function for every candidate model:

lnp(D|θ^{^}) = -n/2ln(2Π) - n/2ln(σ²) - 1/2σ².RSS

Here, σ² is the variance of a model's residuals (prediction errors) distributions σ² = RSS/(n - 1) , with n the number of data samples.

Task 2.4:

Compute the Akaike information criterion (AIC) and Bayesian information criterion (BIC) for every candidate model:

AIC = 2k - 2 ln p(D|θ^{^})
BIC = K.ln(n) - 2 ln p(D|θ^{^})

Here ln p(D|θ^{^}) is the log-likelihood function obtained from Task 2.3 for each model, k is the number of estimated parameters in each candidate model.

Task 2.5:
Check the distribution of model prediction errors (residuals) for each candidate model. Plot the error distributions, and evaluate if those distributions are close to Normal/Gaussian (as the output MEG has additive Gaussian noise), e.g. by using Q-Q plot.

Task 2.6:
Select ‘best' regression model according to the AIC, BIC and distribution of model residuals from the 5 candidate models, and explain why you would like to choose this specific model.

Task 2.7:
Split the input (sound) and output (MEG) dataset (x and y) into two parts: one part used to train the model, the other used for testing (e.g. 70% for training, 30% for testing). For the selected ‘best' model, 1) estimate model parameters use the training dataset; 2) compute the model's output/prediction on the testing data; and 3) also compute the 95% (model prediction) confidence intervals and plot them (with error bars) together with the model prediction, as well as the testing data samples.

Task 3: Approximate Bayesian Computation (ABC)
Using ‘rejection ABC' method to compute the posterior distributions of the ‘selected' regression model parameters in Task 2.
1) You only need to compute 2 parameter posterior distributions -- the 2 parameters with largest absolute values in your least squares estimation (Task 2.1) of the selected model. Fix all the other parameters in your model as constant, by using the estimated values from Task 2.1.
2) Use a Uniform distribution as prior, around the estimated parameter values for those 2 parameters (from the Task 2.1). You will need to determine the range of the prior distribution.
3) Draw samples from the above Uniform prior, and perform rejection ABC for those 2 parameters.
4) Plot the joint and marginal posterior distribution for those 2 parameters.
5) Explain your results.

Attachment:- Statistical Methods for Data Science.rar

Reference no: EM133154875

Questions Cloud

What is the effective annual rate on this loan : You want to buy a new sports coupe for $75,700, and the finance office at the dealership has quoted you a loan. What is the effective annual rate on this loan

Identify potentially applicable employment laws : After reading the text and lesson commentary, choose and answer the questions posed. Identify potentially applicable employment laws from the textbook materials

Discuss examples of successful problem solving courts : Provide and discuss examples of successful problem solving courts? What factors are important to consider when making this determination?

Determine what makes people more likely to be engaged : Use your original and creative ideas to determine what makes people more likely to be engaged in their jobs? Describe how you build a posse. Also, what is the r

7089CEM Introduction to Statistical Methods for Data Science : 7089CEM Introduction to Statistical Methods for Data Science Assignment Help and Solution, Coventry University - Assessment Writing Service

What is the value of disease prevention : In health care, we refer to prevention as avoiding or lessening the effects of disease. In other words, prevention is an effort to prevent or control disease. D

Calculate the allowance for sampling risk : Recorded book value is P1,200,000, maximum tolerable overstatement is P80,000, Calculate the allowance for sampling risk

Estimated costs and predictions of market growth : DutyBoot was a well-established company producing rugged hiking boots. The Board decided to extend the range by launching a 'fashion' hiking boot aimed at femal

Examine the similarities and differences among processes : Be sure to define your terms and answer the question succinctly in a well-worded thesis statement before you expand upon your answer.

User Account

All Pages