Linear regression

The aim of this post is to explain how a linear regression is calculated with a very simple example. Assume we want to approximate a cloud of point with the following line : y=ax as illustrated bellow :

linear_regression

We want to minimize the error given by the square of the difference between points and the line y=ax:

 E=(y_1-a.x_1)^2+(y_2-a.x_2)^2+...+(y_n-a.x_n)^2

By expending each term, the equation becomes :

 E=y_1^2+a^2.x_1^2-2a.y_1.x_1 + y_2^2+a^2.x_2^2-2a.y_2.x_2 + ... + y_n^2+a^2.x_n^2-2a.y_n.x_n
 E=y_1^2 + y_2^2 + ... + y_n^2 + a^2(x_1^2 +x_2^2 +...+x_n^2) - 2a(y_1.x_1 + y_2.x_2 +y_n.x_n)

As we want to minimize the error, the value of a we are looking for is necessary a zero of the derivative of the error. Let’s calculate the derivative :

 \frac{\partial E}{\partial a} = 2a(x_1^2 +x_2^2 +...+x_n^2) - 2(y_1.x_1 + y_2.x_2 +y_n.x_n)

The error is minimized when  \frac{\partial E}{\partial a} =0 :

 2a(x_1^2 +x_2^2 +...+x_n^2) - 2(y_1.x_1 + y_2.x_2 +y_n.x_n)=0

It is now easy to calculate a :

 a=\frac{2(y_1.x_1 + y_2.x_2 +y_n.x_n)}{ 2(x_1^2 +x_2^2 +...+x_n^2) }
 a=\frac{ y_1.x_1 + y_2.x_2 +y_n.x_n}{ x_1^2 +x_2^2 +...+x_n^2 }
 a=\frac { \sum\limits_{i=1}^n x_i.y_i } {\sum\limits_{i=1}^n x_i^2}

Download

Leave a Reply

Your email address will not be published. Required fields are marked *