Simple Linear Regression
While correlation analysis determines the degree to which the variables are related, regression analysis develops the relationship between the variables.
Thus coefficient of correlation indicates the strength of a linear relationship. And here we compute the linear model that best fits the relationship. Once again, we reiterate the importance of using qualitative analysis to arrive at a cause and effect relationship before computing the model.
Regression analysis is based on the relationship between two or more variables. The known variable is the independent variable and the variable we are trying to predict is the dependent variable. An inverse relationship exists between the variables.
If X represents the cause and Y, the effect, we are searching for
= E(Y|X = x) = A + Bx,
i.e., if X takes on the value x, we would expect Y to assume A + Bx.
Since it is (usually) impossible to obtain all possible pairs (X, Y), we need to estimate the model using a sample. The approximate model is given by
E (Y|X = x) = a + bx
In this case, a is an estimate of A and b is an estimate of B.
We may rewrite the population regression line and the sample regression lines as,
y = A + Bx + ex
and
y = a + bx + ex
Where ex and ex are random variables with mean 0.