What do we mean by this?
Imagine I had a list of bank balances and I wanted to multiply them by 1.1 to add 10% interest. I could do it by taking each one individually and adding the 10%. Something like:
For each balance in bank:
newbalance = b * 1.10
pass
However - many computers, and programming languages running on those computers, allow the same job to be done on chunks of many data items. This is vastly more efficient (and quicker) than looping over each data item. This would look something like this:
newbalance = balance * 1.10
The underlining shows the thing is a matrix. Thislast expression allows the computer to operate on all of the balance data items at the same time, paralellising the calculation. Its as if the computer gives the work to a team of workers, instead of getting one worker to work through the list sequentially. In reality, there is a limit to how much this parallisation happens, but in most cases it is vastly more efficient.
This is variously called parallelisation, array programming, vectorisation. Wikipedia has a good entry on it http://en.wikipedia.org/wiki/Array_programming
Now... it is worth exploring to see if this is applicable to our neural network calculations.
In fact calculating the outputs of each neural network layer easily translates into this vectorised approach.
Consider the calculation of outputs from the output layer using the outputs from the hiden layer and of course taking into account the weights.
output_1 = activation_function ( hiddenoutput_1 * weight_11 + hiddenoutput_2 * weight_21 + hiddenoutput_3 * weight_31 )
output_2 = activation_function ( hiddenoutput_1 * weight_12 + hiddenoutput_2 * weight_22 + hiddenoutput_3 * weight_32 )
...
Can you see the pattern? Look at the drawing connecting the hidden layer to the output layer through the weights. Let's user shorter words to see it cleaer:
o_1 = af ( ho_1 * w_11 + ho_2 * w_21 + ho_3 * w_31 )
o_2 = af ( ho_1 * w_12 + ho_2 * w_22 + ho_3 * w_32 )
...
These patterns where you can see essentially the same expressions but with just one array index changing are usually emanable to vectorisation. It doesn't take long to see that:
Neat!
That's the forward flow vectorised.... but I'm still struggling to parallelise the backpropagation that we talked about in the last post.