We'll try to talk through the idea in simple terms, to get the overall gist, and then later in a second post we'll consider some of the details.
Backpropagation Overview
The main idea is this, and it really is this simple:
- We refine (train) a neural network by using example data (training data) where for each example we know the question (input) and the answer (output).
- For each example we use the error - the difference between the right answer and what the neural network outputs - to determine how much we refine the internals of the neural network.
- We keep doing this for other examples, until we're happy that the neural network gets enough answers right, or close enough.
In slightly more detail:
- We start with an untrained network, and a training set of examples to learn from - which consists of inputs and the desired outputs.
- To train the network we apply the input of each example to the untrained network. This could be the human handwritten characters we're interested in for our project.
- The untrained network will, of course, produce an output. But because it is untrained, the output is very likely incorrect.
- We know what the output should be, because we're using a training example set. We can compare the desired output with the actual but incorrect output - that's the error.
- Rather than disregard this error - we use it because it gives us a clue as to how to refine the neural network to improve the output. This refinement is done by adjusting the "weights" in the network, which we saw in an earlier post.
Look at the following diagram showing an input layer of nodes (A, B), an internal so-called hidden layer (C, E) and an output node (F). The diagram also shows the weights between the nodes, for example WAC for the weight between node A and C. Also feeding into C is node B with weight WBC.
Now the question that took me ages to find the answer to is this: Given the error in the output of the network, how do you use this error to update the internal weights. I understood you needed to, of course, but not exactly how. Do you only update one weight? Do you consider the error out of internal nodes to be this same error from the entire neural network?
There are probably other ways to refine the network weights, but one way - the backpropagation algorithm - splits the error in proportion to the weights. So if WCF is twice as large as WEF then it should take twice as much of the error. If the error was, say 6, then WCF should be updated as if the error from C was 4, and WEF updated as if the error from E was 2 - because 4 is twice 2.
We'll look at how we actually update weight itself in the next part 2 of this blog. For now, we just want to understand the overall approach.
The above and below diagrams show the same idea then applied to nodes further back from the final output and error.
Once the error has been propagated back from the output layer back through the middle (known as "hidden", silly name) layers, and back to the weights from the input layers - we have completed a training session. Of course training needs to happen with lots of examples, which has the effect of slowly but surely refining the weights of the neural network to a point where the neural network gets better and better at predicting the right output.
Next time we'll look at the detail of using the error to update a weight. That involves some mathematics, but we'll introduce it gently.