Multi-Layer Perceptrons
Backpropagation
To update the weights for a multi-layer perceptron we need to complete the following steps:
- Feed Forward
- Backwards Pass
The general formula to update the weights on a multi-layer perceptron is:
\[\mathbf w_{(n+1)}=\mathbf w_n-\eta\nabla_\mathbf wE\]where:
- $\nabla_\mathbf wE$ is the gradient with respect to $E$.
- $\eta$ is the learning rate.
We also have the following functions available:
-
Output error for the $j^{\text{th}}$ neuron:
\[e_j(n)=d_j(n)-y_i(n)\] -
Output network error:
\[E(n)=\frac12\sum_{j\in\text{output}}e^2_j(n)\] -
Average network error:
\[E_\text{avg}=\frac1N\sum^N_{n=1}E(n)\] -
Induced local field (ILF):
\[v_j(n)=\text{sum weights and measures}\] -
Neuron output:
\[y_j(n)=\phi(v_j(n))\] -
Local gradient (for one node):
\[\delta_j(n)=e_j(n)\times\underbrace{\phi'(v_j(n))}_\text{activation function gradient}\] -
Weight update function:
\[\nabla_\mathbf WE=-\delta_j(n)y_i(n)\]
Backwards Pass
Consider the following key equations:
A (output neuron):
\[\delta_j(n)=e_j(n)\phi'(v_j(n))\]B (hidden neuron):
\[\delta_j(n)=\phi'_j\left(v_j\left(n\right)\right)\sum_n\left(\delta_k(n)w_{kj}(n)\right)\]1 (weight difference):
\[\begin{aligned} \Delta w_{ji}(n)&=-\eta\frac{\partial E(n)}{\partial w_{ji}(n)}\\ &=\eta\delta_j(n)y_i(n) \end{aligned}\]and follow the steps:
- Calculate the local gradients by using A and B for the output and hidden layers.
- Use equation 1 to update the gradients connecting the $j^\text{th}$ and $k^\text{th}$ neurons.