Regression Equation
The Regression equation also known as the Estimating equations. These are the algebraic expressions of the regression line. As there are two regression lines there are two regression equations; the regression equation of X on Y is used to explain the variations in the values of X for given changes in Y and the regression equation of Y on X is used to explain the variation in the values of Y for given changes in. X
Regression equation of Y on X
The regression equation of Y on X is stated as follows:
Y = a + bx
It may be noted that in this equation y is a dependent variable and its value depends on X.X is independent variable we can take a given value of X and compute the value of Y.
a is Y- intercept as its value is the vertical axis b is the slope of line. It shows change in Y variable for a unit change in X variable.
a and b in the equation are termed as numerical constants as for any given straight line their value does not change.
If the values of the constants a and b are obtained the line is completely determined. Here the question is how to obtain these values. The answer is given by the method of least squares which states that the line must be draw through the plotted points in such a manner that the sum of the squares of the deviations of the actual y values form the computed y values is the least or in another words in order to obtain a line which fits the points best Σ(Y - Yc)2 should be minimum. Such a line is termed as the line of best fit. A straight line fitted by the least squares has the following specifications:
1. This gives the best fit to the data in the sense that it makes the sum of the squared deviations from the line Σ(Y - Yc)2 smaller than they would be from any other straight line. This property accounts for the least squared name.
2. The deviations above the line is equal to those which is below the line on the average. This means that the total of the positive and negative deviations are zero. Or Σ(Y - Yc) = 0.
3. The straight line goes throughout the overall mean of the data (X.Y)
4. When the data shows, simply a form a large population the least squares line is a best estimate of the population regression line.
With a little algebra and differential calculus it can be shown that the following two equations, If solved simultaneously will yield the values of the parameters a and b such that the least squares requirement is fulfilled.
ΣY = N(a + b)ΣX
ΣXT = aΣx + (bΣX2)
These equations are usually termed as the normal equations. In the equations ΣX, ΣXY, ΣX2 indicate the totals which are computed from the observed pairs of values of two variables X and Y to which the least squares as tempting line is to be fitted and N is the number of observed pairs of values.
Regression equation of X only
The regression equation of X on Y is stated as follows:
Xc = a + by
To determine the values of a and b the following two normal equations are to be solved simultaneously
ΣX = N(a + b)ΣY
ΣXY = aΣY + (bΣy2)
Illustration :
From the following data obtain the two regression equations
X
|
6
|
2
|
10
|
4
|
8
|
Y
|
9
|
11
|
5
|
8
|
7
|
Solution :
Obtaining regression equation
X
|
Y
|
XY
|
X2
|
Y2
|
6
|
9
|
54
|
36
|
81
|
2
|
11
|
22
|
4
|
121
|
10
|
5
|
50
|
100
|
25
|
4
|
8
|
32
|
16
|
64
|
8
|
7
|
56
|
64
|
49
|
ΣX=30
|
ΣY = 40
|
ΣXY = 214
|
ΣX2 = 220
|
ΣY2 = 340
|
Regression equation of Y on X: Yc = a + bx
To determine the values of a and b the following two normal equations are to be solved.
Σ (Y - Yc)2 should be minimum or Σ(Y - a - vx) should be minimum (since Yc = a + bx)
Let S = Σ (Y - a - b X)2
Differentiating partially with respect to a and b,
∂S/∂a = Σ (Y - a - b X) (-1) = 0 and ∂S/∂b = Σ(Y - a - b X) (-X) = 0
Or Σ (Y - a - b A) = o or Σ X = N a + b Σ X and Σ XY = aΣX + (bΣX2)
ΣY = Na + bΣX
Σ X Y = aΣX + (bΣX2)
Substitution the values 40 = 5a + 30b
214 = 30a + 220b
Multiplying equation (i) by 6,240 = 30a + 180b
214 = 30a + 220b
Deduction equation (iv) from (iii) 40b = 26 or b = - 0.56
Substituting the value of b in equation (i)
40 = 5a + 30 (- 0.56) or 5a = 40 + 19.5 = 59.5 or a = 11.9
Putting the values of a and b in the equation the regression of Y on X is
Y = (11.9 - 0.56)X
Regression line of X on Y; X c = a + by
And the two normal equations are
Σ X = Na + bΣY
Σ X Y = aΣY + (bΣy2)
30 = 5 + 40b
214 = 40a + 340b
Multiplying equation (i) by 8: 240 = 40a + 320b
214 = 40a + 340b
From eqn (iii) and (iv) -20b = 26 or b = -1.3
Substituting the value of b in equation 30 = 5a + 40 (-1.3)
5a = 30 + 52 = 82, or a = 16.4
Putting the values of a and b in the equation the regression line of X on Y is X = 16.4 - 1.3 Y.