# Perceptron update rule demonstration

## Perceptron

Let's consider the following simple perceptron:

## Transfert function

The transfert function is given by:

 y= f(w_1.x_1 + w_2.x_2 + ... + w_N.x_N) = f(\sum\limits_{i=1}^N w_i.x_i)

Let's define the sum $S(w_i,x_i)$:

 S(w_i,x_i)= \sum\limits_{i=1}^N w_i.x_i)

Let's rewrite $y$:

 y= f(\sum\limits_{i=1}^N w_i.x_i)=f(S(w_i,x_i))

## Error (or loss)

In artificial neural networks, the error we want to minimize is:

 E=(y'-y)^2

with:

• $E$ the error
• $y'$ the expected output (from training data set)
• $y$ the real output of the network (from network)

In practice and to simplify the maths, this error is divided by two:

 E=\frac{1}{2}(y'-y)^2

The algorithm (gradient descent) used to train the network (i.e. updating the weights) is given by:

 w_i'=w_i-\eta.\frac{dE}{dw_i}

where:

• $w_i$ the weight before update
• $w_i'$ the weight after update
• $\eta$ the learning rate

## Derivating the error

Let's derivate the error:

 \frac{dE}{dw_i} = \frac{1}{2}\frac{d}{dw_i}(y'-y)^2

Thanks to the chain rule [ $(f \circ g)'=(f' \circ g).g')$ ] the previous equation can be rewritten:

 \frac{dE}{dw_i} = \frac{2}{2}(y'-y)\frac{d}{dw_i} (y'-y) = -(y'-y)\frac{dy}{dw_i}

Let's calculate the derivative of $y$:

 \frac{dy}{dw_i} = \frac{df(S(w_i,x_i))}{dw_i}

Thanks to the chain rule [ $(f \circ g)'=(f' \circ g).g')$ ] the previous equation can be rewritten:

 \frac{df(S)}{dw_i} = \frac{df(S)}{dS}\frac{dS}{dw_i} = x_i\frac{df(S)}{dS}

The derivative of the error becomes:

 \frac{dE}{dw_i} = -x_i(y'-y)\frac{df(S)}{dS}

## Updating the weights

The weights can be updated with the following formula:

 w_i'=w_i-\eta.\frac{dE}{dw_i} = w_i + \eta. x_i.(y'-y).\frac{df(S)}{dS}

In conclusion :

 w_i'= w_i + \eta.x_i.(y'-y).\frac{df(S)}{dS}