I've been puzzled by my neural network code not working. It struck me that back propagation suffers some issues, like getting struck in local minima, or not being able to learn from inputs that are all the same, or zero.
I think I was suffering from network saturation - where the activation functions were all at the point where the gradient is almost zero. No gradient means no weight change means no learning!
The simple problem I was trying to get working was the famous XOR problem. Normalising the inputs from the range [0, 1.0] to [0.1, 0.9] seems to work much better.
The python source code I have so far is at:
The resultant sum-squared-error for XOR learning is:
I will read with interest the fantastic paper "Efficient Backprop": http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
I tried my early code with OR, and AND not just XOR and it all worked!
I'm still cautious so I'll think about this a bit more and read that paper in more detail.