One of the weights should have been 3.0 not 4.0, which then affects the rest of the calculations.
Here is the corrected section below. The corrected error is highlighted, and this then flows onto the rest of the calculations.
The books will be updated, and you can ask Amazon for a free ebook update if you have that version.
Weight Update Worked Example
Let’s work through a couple of examples with numbers, just to see this weight update method
working.
The following network is the one we worked with before, but this time we’ve added example output values from the first hidden node o j=1 and the second hidden node o j=2 . These are just made up numbers to illustrate the method and aren’t worked out properly by feeding forward signals from the input layer.
The following network is the one we worked with before, but this time we’ve added example output values from the first hidden node o j=1 and the second hidden node o j=2 . These are just made up numbers to illustrate the method and aren’t worked out properly by feeding forward signals from the input layer.
We want to update the weight w 11 between the hidden and output layers, which currently has
the value 2.0.
Let’s write out the error slope again.
Let’s write out the error slope again.
Let’s do this bit by bit:

● The first bit ( t k o k ) is the error e 1 = 1.5, just as we saw before.

● The sum inside the sigmoid functions Σ j w jko j is (2.0 * 0.4) + (3.0 * 0.5) = 2.3.

● The sigmoid 1/(1 + e 2.3 ) is then 0.909. That middle expression is then 0.909 * (1 0.909)
= 0.083.
 ● The last part is simply o j which is oj =1 because we’re interested in the weight w 11 where j = 1. Here it is simply 0.4.
Multiplying all these three bits together and not forgetting the minus sign at the start gives us
0.04969.
If we have a learning rate of 0.1 that give is a change of (0.1 * 0.04969) = + 0.005. So the new w 11 is the original 2.0 plus 0.005 = 2.005.
This is quite a small change, but over many hundreds or thousands of iterations the weights will eventually settle down to a configuration so that the well trained neural network produces outputs that reflect the training examples.
If we have a learning rate of 0.1 that give is a change of (0.1 * 0.04969) = + 0.005. So the new w 11 is the original 2.0 plus 0.005 = 2.005.
This is quite a small change, but over many hundreds or thousands of iterations the weights will eventually settle down to a configuration so that the well trained neural network produces outputs that reflect the training examples.
Haha thanks you, I was going crazy at that part, admire your work so much :), thanks again man.
ReplyDeleteThis comment has been removed by the author.
ReplyDeletethe best book, thanks.
ReplyDeleteI an wondering any blog/post about RNN?
i don't but i will look at RNN for my next book on text analytics .. keep an eye on http://makeyourowntextminingtoolkit.blogspot.co.uk
Deleteif you like the book  please do add a review on amazon .. it helps me invest time for my next book
Are the calcs of the chain rule on page 209 correct?
ReplyDeletethere was an error ... i used y=x^2+x instead of y=x^3+x ...
Deletethe book will be updated, and if you have an ebook you can get a free update
thanks for pointing this out
Are the calcs of the chain rule on page 209 correct?
ReplyDeletei think so .. can you be more specific and i'll check?
DeleteI don't understand at all how you can realistically have an output error of 1.5 as shown in this example Whether you are doing tkok or (tkok)^2, if the output of the node is between 0 and 1 due to the sigmoid transformation, how can the error be 1.5? Unless the target is an unachievable output in the first place.
ReplyDeleteWhat woudl be very helpful is a full example of say a 3x3 net with backpropagation results and the changed weights in all the cells so I can confirm my maths are correct. You've only worked the results for one cell, and its kind of hard to replicate since you dont show the inputs or the target otupu. (plus the problem of the error being 1.5 referred to above.
this is a fair point.
Deleteyou are right  the output can't be 1.5 because of the sigmoid squashing to be between 0 and 1 (and not including 0 and 1).
my focus when I developed that example was to explain the calculations  and if I could do it again, I would not have used a value of 1.5
on your second point about a full example, do you think the simpler and fewer calculations aren't enough to show what happens in a neural network as it is trained? i didn't do a full 3x3 example because the length would be ofputting and the calculations would be the same in nature ...
.. but thanks for getting back to me and maybe an updated second edition would avoid the use of "1.5"
the next update to the book (in a few days time) will have this issue fixed to avoid any confusion. thanks for raising this very good point.
DeleteOn page 82 you write
ReplyDeleteThis is great but did we do the right thing cutting out that normalising factor? ... If our simpler way works really well, we'll keep it.
Then on page 100
You show the diagram using the normalising factor.
e1 and e2 are split in proportion to the link weights.
Through I understand that this heuristic work similar to the simpler one, it confuse me a little bit. I was expecting the diagram to show the simpler heuristic.
Edgardo  thanks for taking the time to get un touch.
DeleteThe simpler heuristic is to simply split the weights in proportion to the link weights. The diagram shows this. What is avoided is the additional normalising factor.
I hope this helps. Get back to me if not.
This comment has been removed by the author.
DeleteYes, let me check my mind
DeleteIf our simpler way works really well, we'll keep it.
Our simple way it is?
1) Split the output error in proportion to the link weights OR
2) Output error simply multiplied by the link weights
When you write in your code (part2_neural_network_mnist_data.ipynb):
hidden_errors = numpy.dot(self.who.T, output_errors)
We are using option 1 or 2?
I was expecting the diagram on page 100 (kindle book) to show the simpler heuristic (option #2)Or maybe I'm confuse/wrong...
hi Edgardo  good question.
DeleteIf we multiply by the link weights, and the normalise .. we in effect are splitting in proportion to the weights.
If we don't normalise .. it still works, especially as many weights will be less than one, so have a reducing effect.
Does that help?
Hello, thank you for this amazing book.
ReplyDeleteOn page 140 of the print version, you show the equation for updating the weight for the link between nodes j and k. You show the sigmoid function applied to the outputs of nodes k. Should this actually show the sigmoid function applied to the inputs to nodes k? Or am I just confused?
Thanks again!
The expression shows the sigmoid applied to both Ok and Oj, see the image in the post above.
ReplyDeleteDoes that help?