Friday, 10 April 2015

Matrices - Doing Lot of Work in One Go

One of the things that programmers and algorithm designers try to do is to take advantage of that fact that computers are quite good at doing lots of the same thing very quickly.

What do we mean by this?

Imagine I had a list of bank balances and I wanted to multiply them by 1.1 to add 10% interest. I could do it by taking each one individually and adding the 10%. Something like:

For each balance in bank:
    newbalance = b * 1.10
    pass

However - many computers, and programming languages running on those computers, allow the same job to be done on chunks of many data items. This is vastly more efficient (and quicker) than looping over each data item. This would look something like this:
 
newbalance = balance * 1.10

The underlining shows the thing is a matrix. Thislast expression allows the computer to operate on all of the balance data items at the same time, paralellising the calculation. Its as if the computer gives the work to a team of workers, instead of getting one worker to work through the list sequentially. In reality, there is a limit to how much this parallisation happens, but in most cases it is vastly more efficient.

This is variously called parallelisation, array programming, vectorisation. Wikipedia has a good entry on it http://en.wikipedia.org/wiki/Array_programming


Now... it is worth exploring to see if this is applicable to our neural network calculations.


In fact calculating the outputs of each neural network layer easily translates into this vectorised approach.

Consider the calculation of outputs from the output layer using the outputs from the hiden layer and of course taking into account the weights.

output_1 = activation_function (  hiddenoutput_1 * weight_11 + hiddenoutput_2 * weight_21 + hiddenoutput_3 * weight_31 )

output_2 = activation_function (  hiddenoutput_1 * weight_12 + hiddenoutput_2 * weight_22 + hiddenoutput_3 * weight_32 )

...

Can you see the pattern? Look at the drawing connecting the hidden layer to the output layer through the weights. Let's user shorter words to see it cleaer:

o_1 = af (  ho_1 * w_11 + ho_2 * w_21 + ho_3 * w_31 )

o_2 = af (  ho_1 * w_12 + ho_2 * w_22 + ho_3 * w_32 )

...

These patterns where you can see essentially the same expressions but with just one array index changing are usually emanable to vectorisation. It doesn't take long to see that:


So when we're implementing this in Python we can use the dot product function to multiply the weights array and the hidden outputs array. The same logic applies to arraive at the hidden outputs using the inputs and their associated weights.

Neat!

That's the forward flow vectorised.... but I'm still struggling to parallelise the backpropagation that we talked about in the last post.