He assumed each training example was fed forward through the network many times, each time reducing the error, and wanted to know when to stop and move onto the next training example. That is:
Training Example 1: FW, BP, FW, BP, FW, BP, ....
Training Example 2: FW, BP, FW, BP, FW, BP, ....
Training Example 3: FW, BP, FW, BP, FW, BP, ....
Training Example 4: FW, BP, FW, BP, FW, BP, ....
( FW=Feed Forward, BP=Back Propagate )
---
My immediate reply was to say that this wasn't how it was normally done, but instead each training example was used in turn. Some call this on-line learning. That is:
Training Example 1: FW, BP
Training Example 2: FW, BP
Training Example 3: FW, BP
Training Example 4: FW, BP
...
And I said that it is often a good idea to repeat this several times, that is, training for several epochs.
---
Some will in fact batch together a few training examples and sum up the error to be used for back propagation. That is:
Training Example 1: FW (accumulate error)
Training Example 2: FW (accumulate error)
Training Example 3: FW (accumulate error)
Training Example 4: FW, BP accumulated error
Training Example 5: FW (accumulate error)
Training Example 6: FW (accumulate error)
Training Example 7: FW (accumulate error)
Training Example 8: FW, BP accumulated error
---
Then I thought about it more and concluded that Hamid's approach wasn't wrong at all - just different. He was asking what the stopping criteria would be for applying the same training data example many times. The real answer is .. I don't know - but I would experiment to find out.
---
Hamid's question is a good one, because it is not often made very clear which order or scheme is used. It is too easy for authors and teachers to assume new readers will know which scheme is being considered, or even which ones are a good idea.
That's why I love feedback from readers - they ask the best questions!
Thanks Hamid!