I've really enjoyed the book. Is there a minor formula error on page 100 (kindle version)? In the "Weight Update Worked Example" section, I believe one of the weights should be 3.0, not 4.0, making the sum inside the sigmoid function 2.3 instead of 2.8.

Brian - I think you're right! Thanks .. I'll update the books so that future versions carry it, and the ebooks can be updated at no additional cost if you email Amazon and ask them. I'll do a blog post once it's done.

Glad to help. It really is an excellent treatment of the ANN topic. I've found nothing else to be this accessible. In the interests of hoping to pay it forward, I have 3 other thoughts that could help to make this critical part of the text a little clearer (apologies if this post isn't the place...): The3x3 diagrams are awesome, but the duplicate labeling of the weights is a little confusing (e.g. W12 shows up on multiple paths in a few spots); Addressing that labeling would help clear up the extent of the joint optimization (across hidden and output layers) in the mathematical example (pg 98); and I'd like to see a depiction of the treatment of multiple observations for a single output node (I.e. The training data has n observations for the input and output nodes). Thank you again for such an excellent treatment of a fascinating topic!

Hey Brian - yes you found something I myself struggled with .. the duplicate labeling. I didn't want to have complicated labeling like W112 and W212 .. I'll think of something! But you are right - we need to focus on accessible ways of explaining stuff, not overloading explanations to scare away readers.

I'm not sure I understand the second point about multiple observations - could you explain more, or point me to a website that discusses it?

If I'm reading it right, the worked example shows how to calculate the first output layer for a single observation (one element of training set). But a training dataset will have "n" observations, all of which have an influence on the parameters. The text does not describe how to incorporate the potential multitude of observations into the optimization. In maximum likelihood we take the product over all observations of the individual likelihoods and optimize that product. What is the corresponding aggregation mechanism in ANN?

1. The process of calculating the weight updates is done per-example. So there is a loop, working through all the examples, and for each one, the input is fed forward, and the difference between actual and desired (error) is used to update the weights.

2. Batched inputs - where a batch (say 10 or 20) inputs are fed forward, the errors are summed, and the weights updated per batch.

In my deliberately simple guide, I only focused on the conceptually simple one-example-at-a-time idea .. which often works well.

Some online discussion comparing "batch" vs "online" training:

This comment has been removed by a blog administrator.

ReplyDeleteHi - currently it isn't possible. The two options are:

ReplyDelete1. Kindle Textbook which is a format similar to PDF, suited to books with many diagrams and illustrations.

2. Paper print book.

If you don't use a Kindle please get in touch at makeyourownneuralnetwork at gmail dot com and we'll find a way.

I've really enjoyed the book.

ReplyDeleteIs there a minor formula error on page 100 (kindle version)? In the "Weight Update Worked Example" section, I believe one of the weights should be 3.0, not 4.0, making the sum inside the sigmoid function 2.3 instead of 2.8.

Brian - I think you're right! Thanks .. I'll update the books so that future versions carry it, and the ebooks can be updated at no additional cost if you email Amazon and ask them. I'll do a blog post once it's done.

DeleteGlad to help. It really is an excellent treatment of the ANN topic. I've found nothing else to be this accessible. In the interests of hoping to pay it forward, I have 3 other thoughts that could help to make this critical part of the text a little clearer (apologies if this post isn't the place...): The3x3 diagrams are awesome, but the duplicate labeling of the weights is a little confusing (e.g. W12 shows up on multiple paths in a few spots); Addressing that labeling would help clear up the extent of the joint optimization (across hidden and output layers) in the mathematical example (pg 98); and I'd like to see a depiction of the treatment of multiple observations for a single output node (I.e. The training data has n observations for the input and output nodes). Thank you again for such an excellent treatment of a fascinating topic!

DeleteHey Brian - yes you found something I myself struggled with .. the duplicate labeling. I didn't want to have complicated labeling like W112 and W212 .. I'll think of something! But you are right - we need to focus on accessible ways of explaining stuff, not overloading explanations to scare away readers.

DeleteI'm not sure I understand the second point about multiple observations - could you explain more, or point me to a website that discusses it?

Thanks!

If I'm reading it right, the worked example shows how to calculate the first output layer for a single observation (one element of training set). But a training dataset will have "n" observations, all of which have an influence on the parameters. The text does not describe how to incorporate the potential multitude of observations into the optimization. In maximum likelihood we take the product over all observations of the individual likelihoods and optimize that product. What is the corresponding aggregation mechanism in ANN?

DeleteAhh i see the question now.

DeleteThe aggregation is done in 2 ways.

1. The process of calculating the weight updates is done per-example. So there is a loop, working through all the examples, and for each one, the input is fed forward, and the difference between actual and desired (error) is used to update the weights.

2. Batched inputs - where a batch (say 10 or 20) inputs are fed forward, the errors are summed, and the weights updated per batch.

In my deliberately simple guide, I only focused on the conceptually simple one-example-at-a-time idea .. which often works well.

Some online discussion comparing "batch" vs "online" training:

https://visualstudiomagazine.com/articles/2014/08/01/batch-training.aspx

https://www.researchgate.net/post/Which_one_is_better_between_online_and_offline_trained_neural_network

http://stats.stackexchange.com/questions/70761/what-is-the-difference-between-online-and-batch-learning

Ok, I get it now. Different than I'm used to in max likelihood. Thank you for clarifying!

Delete