**Updated version** of this page on neural-networks.io.

## Perceptron

Let’s consider the following simple perceptron:

## Transfert function

The transfert function is given by:

Let’s define the sum :

Let’s rewrite :

## Error (or loss)

In artificial neural networks, the error we want to minimize is:

with:

- the error
- the expected output (from training data set)
- the real output of the network (from network)

In practice and to simplify the maths, this error is divided by two:

## Gradient descent

The algorithm (gradient descent) used to train the network (i.e. updating the weights) is given by:

where:

- the weight before update
- the weight after update
- the learning rate

## Derivating the error

Let’s derivate the error:

Thanks to the chain rule [ ] the previous equation can be rewritten:

Let’s calculate the derivative of :

Thanks to the chain rule [ ] the previous equation can be rewritten:

The derivative of the error becomes:

## Updating the weights

The weights can be updated with the following formula:

In conclusion :