Friday, 15 April 2016

Error #1

Michael B found an error on page 32 of the book. That's the section where the idea of moderating the learning is introduced - a learning rate. The second training example uses a target value of 0.9. That's wrong, it should have said 1.9. The calculations which then update the slope A are wrong.

Below is the updated section, and the diagram has also been updated too.




Let’s press on to the second training data example at x = 1.0. Using A = 0.3083 we have y = 0.3083 * 1.0 = 0.3083. The desired value was 2.9 so the error is (2.9 - 0.3083) = 2.5917.  The ΔA = L (E / x) = 0.5 * 2.5917 / 1.0 = 1.2958. The even newer A is now 0.3083 + 1.2958 = 1.6042.


Let’s visualise again the initial, improved and final line to see if moderating updates leads to a better dividing line between ladybird and caterpillar regions.

part1_classifier_refinements_moderated.png

This is really good!

Even with these two simple training examples, and a relatively simple update method using a moderating learning rate, we have very rapidly arrived at a good dividing line y = Ax where A is 1.6042.




The ebook has been updated and you should get an automatic update, or ask Amazon to trigger an update if it is slow to get to you. The print book has also been updated.

3 comments:

  1. Hi,

    I am reading your book. I noticed you derived the error correction amount as
    learning rate * Error / X. This is different than the error correction amount commonly derived via the delta rule. http://www.cs.stir.ac.uk/courses/CSCU9YF/lectures/ANN/3-DeltaRule.pdf

    can you please comment on that? what am i missing here?

    thanks
    casbby

    ReplyDelete
    Replies
    1. good question.

      the delta rule is for nodes which have an activation function operating on weighted inputs - just like our neural networks.

      the delta rule is actually more general than our special case of neural networks with a fully connected nodes and a sigmoid activation function. see more here about how the derivation is more general - https://en.wikipedia.org/wiki/Delta_rule

      the error correction you see at the beginning of the book is for much simpler predictors - linear functions y=Ax+b. the error correction is "exact" and not a gradient descent. these simple predictors are much much simpler than the more sophisticated and needed for more interesting problems than the kilometres-to-miles task set for the linear predictor.

      hope that helps.

      Delete
    2. the book derives the update rule for the more sophistacted neurons later in the book and that follows the same ideas as the delta rule

      Delete