Having real trouble debugging my back propagation algorithm, and also understanding the error in my own derivation of weight change... so took a step back and tried a different approach altogether.
Hill climbing is a fancy term but all we're doing is taking an untrained neural network and making a small change to one of the weights to see if it improves the overall result. If it does, keep that change, if it doesn't discard it and revert.
That's an intuitive explanation - much easier to understand than back propagation at the weight change level.
Why is it called hill climbing? Imagine the landscape of weights .. 2-dimensions is easy to imagine .. that is w1 and w2. We stand at a starting position and we want to find the best w1 and w2. If we take a small step in a random direction, we can then see if it improves the overall output (squared sum error) of the neural network - or not. If it does, you've moved closer to the optimal w1, w2. Keep doing this and you'll get there. The "hill" is a sense of solution quality, the error function for example.
Again - a great graph showing a neural network improving it's accuracy over training epochs:
Do keep in mind that is method, just like the back propagation, can end up with a locally - not globally - optimum solution. Thinking of that landscape above, we end up in a ditch but not the deepest ditch!