Plot the data and the regression line

Assignment Help MATLAB Programming

Reference no: EM132676210

This homework concentrates on regression and Bayesian Modelling.

1. Linear regression

The data set liareg.mat on the Blackboard site contains data vectors x and t. The data were generated according to

t_n = w₀ + w₁x_n + ∈_n

where ∈_n is Gaussian-distributed noise: ∈_n~ N(0, σ²). Use linear regression to identify the coefficients ub and ul. Recall that to do this you need to set up a design matrix X that contains the features and the dummy feature I to go with the bias coefficient w_0; thus

With X on hand. von can find the coefficients from:

w = X^tt

where t is the vector of the targets and X^t is the pseudo-inverse of X. Use piny in MATLAB or construct it yourself as X^t = (X^TX)^-1X^T see the lecture slides.

The detailed requirements of this exercise are described as follows:

(a) Plot the data and the regression line. Measure the correlation between the features and targets. How does it relate to the coefficients?

(b) Estitnate the variance of the noise by finding the variance of the differences between your prediction of the targets and the actual targets. Thus if y_n = w₀ + w₁x_n is the prediction of the n-th target, then you could estimate the variance at as:

σ² = 1/N ∑_n=1^N (t_n - y_n)²

Does your estimate of the variance make sense in terms of the average deviation of the targets from the prediction? Why?

2. Robust linear regression

Standard regression analysis minimises the squared error between the regression line and the data. namely:

E₂(w) = ∑_n=1^N(t_n - y_n)(x:w))²

where y_n(w) = w₀ + w₁x_n and w = (w₀, w₁). Recall that the sum of squares error function E2 comes from the negative log likelihood and the assumption that the errors are normally distributed. A heavy-tailed distribution that is more appropriate if there are occasional large deviations from the systematic trend is the Laplacian distribution:

p(∈_n) = p(t_n|xn, w) ∝ exp {-|∈_n|/σ}.

The detailed requirements of this exercise are described as follows:

(a) Substitute this expression for p(tn|xn, w) into the general expression for an error function E(w) = - ∑_n=1^N(t_n log P(t_n |x_n, w) to show that the error function that arises from this noise distribution is

E₁(w) = ∑_n=1^N|t_n - y_n(x;w)|

(b) In the data set lizIregioutlier °mat on the Blackboard site, the data vector z is the same as t except that one element is an outlier. Plot zn versus xn and find the linear regression line for these data using E2. Notice how the regression line is affected by the single outlier.

(c) Write a AIATLAB routine to fit a find a straight fitting the data by minimising E1(w). Note that the pseudo-inverse will not work here. One possibility is to find the minimum error by trying a grid of combinations of tn) and tel. From your plot of the data you should be able to estimate appropriate ranges of en3 (the intercept) and tel (the gradient) to search. If you adopt this approach it would be nice to plot a contour or imsgesc representation of El(w) as a function of ina and tut. The above searching-grid method works well when there are just two coefficients to be found. but is computation-ally very expensive when there are many. An alternative is to use a numerical minimiser such as the NIATIAB subroutine froissearch.m to locate the minimum you might start the search at the solution to the E2 problem.

(d) Plot the linear regression line for these data using E1. Compare your fitted line with the line derived from the squared error (all on the same graph). Discuss how the regression lines are grossly affected by the single outlier.

3. The support vector regression (SVR) with the e-sensitive loss (∈ > 0) is given by

Derive the dual problem of the above formulation. and explain how to extend it to the nonlinear (kernelized) case.

4. LASSO (least absolute shrinkage and selection operator) is a regression analysis method which is given by method which is give by

(a) Explain the LASS (1) as an MAP solution in the Bayesian learning.

(b) Prove that LASSO (1) is equivalent to

Attachment:- MTL_hw.zip

Reference no: EM132676210

Questions Cloud

Internal and external controls in light of the circumstances : How would healthcare manager implement these internal and external controls in light of the circumstances?

Discusses broad context of risk and investigative forensics : The readings this discusses broad context of risk and investigative forensics. Part of risk management is to understand when things go wrong,

What are the market risks ltd encounters : LTD Acceptance is a private property and auto insurance carrier that specializes in sports cars and motorcycles. This organization is owned by LTD Capital.

Board architecture at arcelor mittal : The merger of steel makers Arcelor and Mittal in 2006 produced the world's largest steel company, with 330,000 employees and forecast earnings of $15.6 billion.

Plot the data and the regression line : Plot the data and the regression line. Measure the correlation between the features and targets. How does it relate to the coefficients

Determien how might a financial statement reader detect : Determien How might a financial statement reader detect such behavior? In addition, provide an example to prove how the equity method may result in manipulation

Analyzing data-oriented systems : How analysts use data dictionaries for analyzing data-oriented systems?

Saudi arabia in effort to improve health : Locate a policy that has been implemented by the Ministry of Health in Saudi Arabia in an effort to improve health.

How do summarize the concepts of managerial accounting : Summarize the concepts of managerial accounting including Break Even Analysis, Budgets, and other calculations. Some concepts may carry over

User Account

All Pages