Tuesday, 23 May 2017

Learning MNIST with GPU Acceleration - A Step by Step PyTorch Tutorial

I'm often asked why I don't talk about neural network frameworks like Tensorflow, Caffe, or Theano.

Reasons for Not Using Frameworks

I avoided these frameworks because the main thing I wanted to do was to learn how neural networks actually work. That includes learning about the core concepts and the maths too. By creating our own neural networks code, from scratch, we can really start to understand them, and the issues that emerge when trying to apply them to real problems.

We don't get that learning and experience if we only learned how to use someone else's library.

Reasons for Using Frameworks - GPU Acceleration

But there are some good reasons for using such frameworks, after you've learned about how neural networks actually work.

One reason is that you want to take advantage of the special hardware in some computers, called a GPU, to accelerate the core calculations done by a neural network. The GPU - graphics processing unit - was traditionally used to accelerate calculations to support rich and intricate graphics, but recently that same special hardware has been used to accelerate machine learning.

The normal brain of a computer, the CPU, is good at doing all kinds of tasks. But if your tasks are matrix multiplications, and lots of them in parallel, for example, then a GPU can do that kind of work much faster. That's because they have lots and lots of computing cores, and very fast access to locally stored data. Nvidia has a page explaining the advantage, with a fun video too - link. But remember, GPU's are not good for general purpose work, they're just really fast at a few specific kinds of jobs.

The following illustrates a key difference between general purpose CPUs and GPUs with many, more task-specific, compute cores:

GPU's have hundreds of cores, compared to a CPU's 2, 4 or maybe 8.

Writing code to directly take advantage of GPU's is not fun, currently. In fact, it is extremely complex and painful. And very very unlike the joy of easy coding with Python.

This is where the neural network frameworks can help - they allows you to imagine a much simpler world - and write code in that word, which is then translated into the complex, detailed, and low-level  nuts-n-bolts code that the GPUs need.

There are quite a few neural network frameworks out there .. but comparing them can be confusing. There are a few good comparisons and discussions on the web like this one - link.


I'm going to use PyTorch for three main reasons:
  • It's largely vendor independent. Tensorflow has a lot of momentum and interest, but is very much a Google product. 
  • It's designed to be Python - not an ugly and ill-fitting Python wrap around something that really isn't Python. Debugging is also massively easier if what you're debugging is Python itself.
  • It's simple and light - preferring simplicity in design, working naturally with things like the ubiquitous numpy arrays, and avoiding hiding too much stuff as magic, something I really don't like.

Some more discussion of PyTorch can be found here - link.

Working With PyTorch

To use PyTorch, we have to understand how it wants to be worked with. This will be a little different to the normal Python and numpy world we're used to.

The main ideas are:
  • build up your network architecture using the building blocks provided by PyTorch - these are things like layers of nodes and activation functions.
  • you let PyTorch automatically work out how to back propagate the error - it can do this for any of the building blocks it provides, which is really convenient.
  • we train the network in the normal way, and measure accuracy as usual, but pytorch provides functions for doing this.
  • to make use of the GPU, we configure a setting to and push the neural network weight matrices to the GPU, and work on them there.
We shouldn't try to replicate what we did with our pure Python (and bumpy) neural network code - we should work with PyTorch in the way it was designed to be used.

A key part of this auto differentiation. Let's look at that next.

Auto Differentiation

A powerful and central part of PyTorch is the ability to create neural networks, chaining together different elements - like activation functions,  convolutions, and error functions - and for PyTorch to work out the error gradients for the various parameters we want to improve.

That's quite cool if it works!

Let's see it working. Imagine a simple parameter $y$ which depends on another input variable $x$. Imagine that

$$  y = x^2 + 5x + 2 $$

Let's encode this in PyTorch:

import torch
from torch.autograd import Variable

x = Variable(torch.Tensor([2.0]), requires_grad=True)

y = (x**2) + (5*x) + 2

Let's look at that more slowly.  First we import torch, and also the Variable from torch.autograd, the auto differentiation library. Variable is important because we need to wrap normal Python variables with it, so that PyTorch can do the differentiation. It can't do it with normal Python variables like a = 10, or b = 5*a. Variables include links to where the variables came from - so that if one depends on another, PyTorch can do the correct differentiation.

We then create x as a Variable. You can see that it is a simple tensor of trivial size, just a single number, 2.0. We also signal that it requires a gradient to be calculated.

A tensor? Think of it as just a fancy name for multi-dimensional matrices. A 2-dimensional tensor is a matrix that we're all familiar with, like bumpy arrays. A 1-dimensional tensor is like a list. A 0-dimensional one is just a single number. When we create a torch.Tensor([2.0]) w'ere just creating a single number.

We then create the next Variable called y. That looks like a normal Python variable by the way we've created it .. but it isn't, because it is made from x, which is a PyTorch Variable. Remember, the magic that Variable brings is that when we define y in terms of x, the definition of y remembers this, so we can do proper differentiation on it with respect to x.

So let's do the differentiation!


That's it. That all that is required to ask PyTorch to use what it knows about y and all the Variables it depends on to work out how to differentiate it.

Let's see if it did it correctly. Remember that $x=2$ so we're asking for

$$ \frac{\delta y}{\delta x}\Big|_{x=2} =  2x + 5 = 9$$

This is how we ask for that to be done.


Let's see how all that works out:

It works! You can also see how y is shown as type Variable, not just x.

So that's cool. And that's how we define our neural network, using elements that PyTorch provides us, so it can automatically work out error gradients.

Let's Describe Our Simple Neural Network

Let's look at some super-simple skeleton code which is a common starting point for many, if not all, PyTorch neural networks.

import torch
import torch.nn

class NeuralNetwork(torch.nn.Module):

    def __init__(self):


    def forward(self, inputs):

        return outputs

net = NeuralNetwork()

The neural network class is derived from torch.nn.Module which brings with it the machinery of a neural network including the training and querying functions - see here for the documentation.

There is a tiny bit of boilerplate code we have to add to our initialisation function __init__() .. and that's calling the initialisation of the class it was derived from. That should be the __init__() belonging to torch.nn.Module. The clean way to do this is to use super():

    def __init__(self):
        # call the base class's initialisation too

We're not finished yet. When we create an object from the NeuralNetwork class, we need to tell it at that time what shape it will be. We're sticking with a simple 3-layer design .. so we need to specify how many nodes there are at the input, hidden and output layers. Just like our pure Python example, we pass this information to the __init__() function. We might as well create these layers during the initialisation. Our __init__() now looks like this:

    def __init__(self, inodes, hnodes, onodes):
        # call the base class's initialisation too
        # define the layers and their sizes, turn off bias
        self.linear_ih = nn.Linear(inodes, hnodes, bias=False)
        self.linear_ho = nn.Linear(hnodes, onodes, bias=False)
        # define activation function

        self.activation = nn.Sigmoid()

The nn.Linear() module is the thing that creates the relationship between one layer and another and combines the network signals in a linear way .. which is what we did in our pure Python code. Because this is PyTorch, that nn.Linear() creates a parameter that can be adjusted .. the link weights that we're familiar with. You can read more nn.Linear() about it here.

We also create the activation function we want to use, in this case the logistic sigmoid function. Note, we're using the one provided by torch.nn, not making our own.

Note that we're not using these PyTorch elements yet, we're just defining them because we have the information about the number of input, hidden and output nodes.

We have to over-ride the forward() function in our neural network class. Remember, that backward() is provided automatically, but can only work if PyTorch knows how we've designed our neural network - how many layers, what those layers are doing with activation functions, what the error function is, etc.

So let's create a simple forward() function which is the description of the network architecture. Our example will be really simple, just like the one we created with pure Python to learn the MNIST dataset.

    def forward(self, inputs_list):
        # convert list to Variable
        inputs = Variable(inputs_list)
        # combine input layer signals into hidden layer
        hidden_inputs = self.linear_ih(inputs)
        # apply sigmiod activation function
        hidden_outputs = self.activation(hidden_inputs)
        # combine hidden layer signals into output layer
        final_inputs = self.linear_ho(hidden_outputs)
        # apply sigmiod activation function
        final_outputs = self.activation(final_inputs)
        return final_outputs

You can see the first thing we do is convert the list of numbers, a Python list, into a PyTorch Variable.  We must do this, otherwise PyTorch won't be able to calculate the error gradient later.

The next section is very familiar, the combination of signals at each node, in each layer, followed immediately by the activation function. Here we're using the nn.Linear() elements we defined above, and the activation function we defined earlier, using the torch.nn.Sigmoid() provided by PyTorch.

Error Function
Now that we've defined the network, we need to define the error function. This is an important bit of information because it defines how we judge the correctness of the neural network, and wrong-ness is used to update the internal parameters during training.

There are any error functions that people use, some better for some kinds of problems than others. We'll use the really simple one we developed for the pure Python network, the squared error function.  It looks like the following.

error_function = torch.nn.MSELoss(size_average=False)

We've set the size_average parameter to False to avoid the error function dividing by the size of the target and desired vectors.


We're almost there. We've just defined the error function, which means we know how far wrong the neural network is during training. We know that PyTorch can calculate the error gradients for each parameter.

When we created our simple neural network, we didn't think too much about different ways of improving the parameters based on the error function and error gradients. We simply descended down the gradients a small bit. And that is simple, and powerful.

Actually there are many refined and sophisticated approaches to doing this step. Some are designed to avoid false minimum traps, others designed to converge as quickly as possible, etc. We'll stick to the simple approach we took, and the closest in the PyTorch toolset is the stochastic gradient descent:

optimiser = torch.optim.SGD(net.parameters(), lr=0.1)

We feed this optimiser the adjustable parameters of our neural network, and we also specify the familiar learning rate as lr.

Finally, Doing the Update

Finally, we can talk about doing the update - that is, updating the neural network parameters in response to the error seen with each training example.

Here's how we do that for each training example:

  • calculate the output for a training data example
  • use the error function to calculate the difference (the loss, as people call it)
  • zero gradients of the optimiser which might be hanging around from a previous iteration
  • perform automatic differentiation to calculate new gradients
  • use the optimiser to update parameters based on these new gradients

In code this will look like:

for inputs, target in training_set:

    output = net(inputs)

    # Compute and print loss
    loss = error_function(output, target)

    # Zero gradients, perform a backward pass, and update the weights.

It is a common error not to zero the gradients during each iteration, so keep an eye out for that. I'm not really sure why the default is not to clear them ...

The Final Code 

Now that we have all the elements developed and understood, we can rewrite the pure python neural network we developed in the course of Make Your Own Neural Network and throughout this blog.

You can find the code as a notebook on GitHub:

The only unusual thing I had to work out was that during the evaluation of performance, we keep a scorecard list, and append a 1 to it if the network's answer matches the known correct answer from the test data set. This comparison needs the actual number to be extracted from the PyTorch tensor via numpy, as follows. We couldn't just say label == correct_label.

if (label.data[0][0] == correct_label):

The results seem to match our pure python code for performance - no major difference, and we expected that because we've tried to architect the network to be the same.

Performance Comparison On a Laptop

Let's compare performance between our simple pure python (with bumpy) code and the PyTorch version. As a reminder, here are the details of the architecture and data:

  • MNIST training data with 60,000 examples of 28x28 images
  • neural network with 3 layers: 784 nodes in input layer, 200 in hidden layer, 10 in output layer
  • learning rate of 0.1
  • stochastic gradient descent with mean squared error
  • 5 training epochs (that is, repeat training data 5 times)
  • no batching of training data

The timing was done with the following python notebook magic command in the cell that contains only the code to train the network. The options ensure only one run of the code, and the -c option ensures unix user time is used to account for other tasks taking CPU time on the same machine.

%%timeit -n1 -r1 -c

The results from doing this twice eon a MacBook Pro 13 (early 2015), which has no GPU for accelerating the tensor calculations, are:

  • home-made simple pure python - 440 seconds, 458 seconds
  • simple PyTorch version - 841 seconds, 834 seconds

Amazing! Our own home-made code is about 1.9 times faster .. roughy twice as fast!

GPU Accelerated Performance

One of the key reasons we chose to invest time learning a framework like PyTorch is that it makes it easy to take advantage of GPU acceleration. So let's try it.

I don't have a laptop with a CUDA GPU so I fired up a Google Cloud Compute Instance.  The specs for mine are:

  • n1-highmem-2 (2 vCPUs, 13 GB memory)
  • Intel Sandy Bridge
  • 1 x NVIDIA Tesla K80 GPU

So we can compare GPU results with CPU results, I ran the above code but this time not as a notebook but a command line script, using the unix time command. This will I've us the time to complete the whole program, including the training and testing stages. The results are:

real    8m14.387s
user    7m31.223s
sys     8m39.810s

The interpretation of these numbers needs some sophistication, especially if our code has multiple threads, so we'll just stick to the simple real wall-clock time of 8m14s or 494 seconds.

Now we need to change the code to run on he GPU. First check that CUDA - NVIDIA's GPU acceleration framework - is available to Python and PyTorch:

Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()

So CUDA is available. This gave a False on my own home laptop.

The overall approach to shifting work from the CPU to the GPU is to shift the tensors there. Here is the current (but immature) PyTorch guidance on working with the GPU. To create a Tensor on a GPU we use torch.cuda:

>>> x = torch.cuda.FloatTensor([1.0, 2.0])
>>> x

[torch.cuda.FloatTensor of size 2 (GPU 0)]

You can see that this new tensor x is created on the GPU, it is shown as GPU 0, as there can be more. If we perform a calculation on x, it is actually varied out on the same GPU 0, and if the results are assigned to a new variable, they are also stored on the same GPU.

>>> y = x**x
>>> y

[torch.cuda.FloatTensor of size 2 (GPU 0)]

This may not seem like much but is incredibly powerful - yet easy to use, as you've just seen.

The changes to the code are minimal:

  • we move the neural network class to the GPU once we've created it using n.cuda()
  • the inputs are converted from a list to a PyTorch Tensor, we now use the CUDA variant: inputs = Variable(torch.cuda.FloatTensor(inputs_list).view(1, self.inodes))
  • similarly the target outputs are also coverted using this variant: target_variable = Variable(torch.cuda.FloatTensor(targets_list).view(1, self.onodes), requires_grad=False)

That's it! Not too difficult at all .. actually that took a day to work out because the PyTorch documentation isn't yet that accessible to beginners.

The results from the GPU enabled version of the code are:

real    6m6.328s
user    5m57.443s
sys     0m13.488s

That is faster at 366 seconds. That's about 25% faster. We're seeing some encouraging results.

Let's do more runs, just to be scientific and collate the results:


494 366. 
483 372. 
451 355. 

476.0 364.3

The GPU based network is consistently faster by about 25%.

Perhaps we expected the code to be much much faster? Well for such a small network, the overheads corrode the benefits. The GPU approach really shines for much larger networks and data.

Let's do a better experiment and compare the PyTorch code in CPU and GPU mode, varying the number of hidden layer nodes.  Here are the results:

nodes CPU GPU

200 463 362
1000 803 356
2000 1174 366
5000 3390 518

Visualising this ...

We can see now the benefit of a PyTorch using the GPU. As the scale of the network grows (hidden layer nodes here), the time it takes for the GPU to complete training rises very slowly, compared to the CPU doing it, which rises quickly.

One one more tweak .. the contributors at GitHub suggested setting an environment variable to control how many CPU threads the task is managed by. See here.  In my Google GPU instance I'll set this to OMP_NUM_THREADS=2. The resulting duration is 361 seconds .. so not much improved. We didn't see an improvement when we tried it on the CPU only code, earlier. I did see that less threads were being used, by using the top utility, but at these scales I didn't see a difference.

Friday, 7 April 2017

Neural Network in Forth

I love how people have been inspired to make their own neural networks in their own way, sometimes using R or Julia programming langauages.

I was very pleasantly surprised that Robin had decided to make neural networks in Forth.

Forth is an interesting langauge - you can read about it here, and here - it is a small, efficient and fast language, with applicatiosn often close to the metal.

You can follow Robin's progress here: https://rforth.wordpress.com/

Monday, 6 March 2017

Guest Post: Python to R

This is a guest post by Alex Glaser, who runs the London Kaggle meetup and organises several dojos.

Alex took on the challenge of making his own neural networks, but instead of using Python, he used R. Here is talks about that journey, things he had to overcome, and some insight into performance differences and tools for profiling too.

Python to R Translation

Having read through Make your own Neural Network (and indeed made one myself) I decided to experiment with the Python code and write a translation into R. Having been involved in statistical computing for many years I’m always interested in seeing how different languages are used and where they can be best utilised.
There were a few ground rules I set myself before starting the task:
  • All code was to be ‘base’ R (other packages could be added later)
  • The code would be as close to a ‘line-by-line’ translation (again, more R-centric code could be written later)
  • The assignment opertor “<-” would be used.
As a little aside, a quick word about the assignment operator. It can be confusing for new users, or those coming from other languages, but for the majority of issues it can be used interchangeably with “=”. Having been a long time R user I quite like the assignment operator, a little history about it can be found here here. It also provides a bit of continuity with the other assignment operators, notably the global assignment operator “<<-”. It also allows assignment of a variable within a function call, e.g.

Translating the code from Python to R also allowed me to start using R Studio’s notbook. Don’t get me wrong, I do like Juypter, but there’s always room to look at what else is out there. Each cell starts with a magic-like command saying what language is going to be used in each cell e.g. ```{r} for R, ```{python} for Python, etc.

Just sticking with the code in Part2 of Tariq’s book (code available here) a simple place to start was just to replicate printing of a single MNIST image (part2_mnist_data_set.ipynb). Reading the data in was fairly simple; both R and Python have the readlines command (readLines in R), R also has some nice graphical capabilities and matrix is a commonly used object. A few ideas cropped up which might be of interest to a new user: splitting a string results in a list (another R data type) and in order to plot the image successfully we need to reverse the ordering of the rows. The latter could be done using indexes but I thought using an apply function would be quite a nice way of doing this. The apply suite of functions are an important part of R code and often provide a succinct way of coding without lots of for loops.

Okay, one notebook down, another one to go, this time the biggie (part2_neural_network_mnist_data.ipynb). One aspect of Python (and other aspects of object-orientated languages) that differs from R is the notion of a class. A class does exist in R, but often they are used internally to ‘collect’ all output from a function, e.g.

Also, this class would be defined at the end of a function rather than at the start, e.g. you may get code like the following at the end of a function

which would return an object of class ‘quiz’.

Our initial attempt at ‘translating’ the code was supposed to be as close as a ‘line-to-line’ translation as possibe, so that people could see how one line in Python would be written in R. This also meant that we had to create an artificial class using R’s function; note that it uses the dollar symbol to reference elements of this class, rather than the dot that we see in Python code. Also, we used the word ‘self’ to allow continuation with the Python code though it doesn’t often get used with R code. One final comment, it only replicates some of the functionality of a class, it isn’t a class replacement so some of the behaviour may not be the same.
Matrix multiplication in R is done by using the following command: “%*%”, e.g

Most of the time the coding was relatively straighforward, and after a few false starts, we managed to replicate the results of the original Python code and get over 97% accuracy. However there was one big difference, the time taken. Now I’ve heard all sorts of arguments about the speed comparison of R and Python, but had assumed that since things like matrix multiplication were undertaken in C++ or Fortran these speed differences would not be considerate, however that was not the case. The Python code on my (admittedly 5+ year old Mac) takes about 6 mins, whilst the R code took roughly double that.

There are a few nice ‘profiling’ commands in R (and the profVis package provides some nice interactivity) and when we looked the R code in more depth it was the final matrix multiplication in the ‘train’ function that was taking about 85% of the time (we used the tcrossprod command in R to separate this multiplication from the rest). This last matrix multiplication is simply the outer product of two vectors, so it’s difficult to see why it would be too time consuming

Looking at a few examples it’s not hard to see that Python’s np.dot function is far faster than R’s %*% command. Now for a few matrices this isn’t an issue (what’s a few hundreths of a second against a few thousandths?), however for the MYONN model we’ll be calling each function 300,000 times, so after a while this time differential builds up.

As mentioned earlier this difference in timings is quite surprising since the underlying code should be C++ or Fortran. It could also be that some underlying library was better optimised in Python than R. This will definitely be explored at a future R or Python coding dojo.
It’s been a fun experience, and as with all work there’s more unexpected questions that come up. A brief synopsis of future work will be:
  • Try and figure out why Python’s matrix multiplication is so much quicker than R’s. Could also try some functions from Rcpp.
  • Write the code so that it is a bit more Rcentric, and see if there are any libraries ,such as in the tidyverse, which might be useful (though it would only really be useful if we can solve the previous problem).
  • Look at using Julia to see how that compares with R and Python

The R code is available from my GitHub page here, so feel free to download and change as you see fit. Any help with regards optimising the numerical libraries in R to match Python’s speed would be appreciated.

Sunday, 26 February 2017

Book Translations

I've been really lucky with the interest in my Make Your Own Neural Network book.

Some publishers have been interested in taking the book, but after some thinking I've resisted the temptation because:

  • I can price the books how I want .. this is important especially for the ebook which I want to be as cheap and accessible as possible. Some publishers will increase the ebook price by an order of magnitude!
  • I can update the books to fix errors, and have the updated book ready for people to buy within hours, and usually within 24 hours.
  • As an author who has spent lots of my own time and effort on this, I get a much fairer deal with Amazon than with traditional publishers.

However, I have agreed to other language translations of the book to be handled by publishers. So far, the book is on course to be published in:

  1. German
  2. Chinese
  3. Russian
  4. Japanese
  5. Korean

I love the "traditional animal" that O'Reilly have done for the German version:

I'm looking forward to more translations - personally I wish there was a Spanish and Italian one too.

Saturday, 7 January 2017

Neural Networks on a Raspberry Pi Zero - Updated

The Raspberry Pi default operating system Raspian has seen signifcant updates since we last looked at getting IPython notebooks and our neural networks to work on the Raspberry Pi Zero ... for example:

  • the base Raspian operating system is now based on the next major Debian version called Jessie
  • some of the installation instructions can now be simpler
  • some of the new technology causes new problems to work around

.. so we've updated the guide. Here it is...

In this section we will aim to get IPython set up on a Raspberry Pi.

There are several good reasons for doing this:

  • Raspberry Pis are fairly inexpensive and accessible to many more people than expensive laptops.

  • Raspberry Pis are very open - they run the free and open source Linux operating system, together with lots of free and open source software, including Python. Open source is important because it is important to understand how things work, to be able to share your work and enable others to build on your work. Education should be about learning how things work, and making your own, and not be about learning to buy closed proprietary software.

  • For these and other reasons, they are wildly popular in schools and at home for children who are learning about computing, whether it is software or building hardware projects.

  • Raspberry Pis are not as powerful as expensive computers and laptops. So it is an interesting and worthy challenge to be prove that you can still implement a useful neural network with Python on a Raspberry Pi.

I will use a Raspberry Pi Zero because it is even cheaper and smaller than the normal Raspberry Pis, and the challenge to get a neural network running is even more worthy! It costs about £4 UK pounds, or $5 US dollars. That wasn’t a typo!

Here’s mine, shown next to a 2 penny coin. It’s tiny!


Installing IPython

We’ll assume you have a Raspberry Pi powered up and a keyboard, mouse, display and access to the internet working.

There are several options for an operating system, but we’ll stick with the most popular which is the officially supported Raspian, a version of the popular Debian Linux distribution designed to work well with Raspberry Pis. Your Raspberry Pi probably came with it already installed. If not install it using the instructions at that link. You can even buy an SD memory card with it already installed, if you’re not confident about installing operating systems.

This is the desktop you should see when you start up your Raspberry Pi. I’ve removed the desktop background image as it’s a little distracting.


You can see the menu button clearly at the top left, and some shortcuts along the top too.

We’re going to install IPython so we can work with the more friendly notebooks through a web browser, and not have to worry about source code files and command lines.

To get IPython we do need to work with the command line, but we only need to do this once, and the recipe is really simple and easy.

Open the Terminal application, which is the icon shortcut at the top which looks like a black monitor. If you hover over it, it’ll tell you it is the Terminal. When you run it, you’ll be presented with a black box, into which you type commands, looking like the this.


Your Raspberry Pi is very good because it won’t allow normal users to issue commands that make deep changes. You have to assume special privileges. Type the following into the terminal:

sudo su -

You should see the prompt end in with a ‘#’ hash character. It was previously a ‘$’ dollar sign. That shows you now have special privileges and you should be a little careful what you type.

The following commands refresh your Raspberry’s list of current software, and then update the ones you’ve got installed, pulling in any additional software if it’s needed.

apt-get update
apt-get dist-upgrade

Unless you already refreshed your software recently, there will likely be software that needs to be updated. You’ll see quite a lot of text fly by. You can safely ignore it. You may be prompted to confirm the update by pressing “y”.

Now that our Raspberry is all fresh and up to date, issue the command to get IPython. Note that, at the time of writing, the Raspian software packages don’t contain a sufficiently recent version of IPython to work with the notebooks we created earlier and put on github for anyone to view and download. If they did, we would simply issue a simple “apt-get install ipython3 ipython3-notebook” or something like that.

If you don’t want to run those notebooks from github, you can happily use the slightly older IPython and notebook versions that come from Raspberry Pi’s software repository.

If we do want to run more recent IPython and notebook software, we need to use some “pip” commands in additional to the “apt-get” to get more recent software from the Python Package Index. The difference is that the software is managed by Python (pip), not by your operating system’s software manager (apt). The following commands should get everything you need.

apt-get install python3-matplotlib
apt-get install python3-scipy

pip3 install jupyter

After a bit of text flying by, the job will be done. The speed will depend on your particular Raspberry Pi model, and your internet connection. The following shows my screen when I did this.


The Raspberry Pi normally uses an memory card, called an SD card, just like the ones you might use in your digital camera. They don’t have as much space as a normal computer. Issue the following command to clean up the software packages that were downloaded in order to update your Raspberry Pi.

apt-get clean

Recent versions of Raspian replaced the Epiphany web browser with Chromium (an open source version of the popular Chrome browser). Epiphany is much lighter than the heavier Chromium and works better with the tiny Raspberry Pi Zero. To set it as the default browser to be used later for the IPython notebooks issue the following command:

update-alternatives --config x-www-browser

This will tell you what what the current default browser is, and asks you to set a new one if you want to. Select the number associated with Epiphany, and you’re done.

That’s it, job done. Restart your Raspberry Pi in case there was a particularly deep change such as a change to the very core of your Raspberry Pi, like a kernel update. You can restart your Raspberry Pi by selecting the “Shutdown …” option from the main menu at the top left, and then choosing “Reboot”, as shown next.


After your Raspberry Pi has started up again, start IPython by issuing the following command from the Terminal:


This will automatically launch a web browser with the usual IPython main page, from where you can create new IPython notebooks. Jupyter is the new software for running notebooks. Previously you would have used the “ipython3 notebook” command, which will continue to work for a transition period. The following shows the main IPython starting page.


That’s great! So we’ve got IPython up and running on a Raspberry Pi.

You could proceed as normal and create your own IPython notebooks, but we’ll demonstrate that the code we developed in this guide does run. We’ll get the notebooks and also the MNIST dataset of handwritten numbers from github. In a new browser tab go to the link:

You’ll see the github project page, as shown next. Get the files by clicking “Download ZIP” after clicking “Clone or download” at the top right.


If github doesn’t like Epiphany, then enter the following into your browser to download the files:

The browser will tell you when the download has finished. Open up a new Terminal and issue the following command to unpack the files, and then delete the zip package to clear space.

unzip Downloads/makeyourownneuralnetwork-master.zip
rm -f Downloads/makeyourownneuralnetwork-master.zip

The files will be unpacked into a directory called makeyourownneuralnetwork-master. Feel free to rename it to a shorter name if you like, but it isn’t necessary.

The github site only contains the smaller versions of the MNIST data, because the site won’t allow very large files to be hosted there. To get the full set, issue the following commands in that same terminal to navigate to the mnist_dataset directory and then get the full training and test datasets in CSV format.

cd makeyourownneuralnetwork-master/mnist_dataset

The downloading may take some time depending on your internet connection, and the specific model of your Raspberry Pi.

You’ve now got all the IPython notebooks and MNIST data you need. Close the terminal, but not the other one that launched IPython.

Go back to the web browser with the IPython starting page, and you’ll now see the new folder makeyourownneuralnetwork-master showing on the list. Click on it to go inside. You should be able to open any of the notebooks just as you would on any other computer. The following shows the notebooks in that folder.


Making Sure Things Work

Before we train and test a neural network, let’s first check that the various bits, like reading files and displaying images, are working. Let’s open the notebook called “part3_mnist_data_set_with_rotations.ipynb” which does these tasks. You should see the notebook open and ready to run as follows.


From the “Cell” menu select “Run All” to run all the instructions in the notebook. After a while, and it will take longer than a modern laptop, you should get some images of rotated numbers.


That shows several things worked, including loading the data from a file, importing the Python extension modules for working with arrays and images, and plotting graphics.

Let’s now “Close and Halt” that notebook from the File menu. You should close notebooks this way, rather than simply closing the browser tab.

Training And Testing A Neural Network

Now let’s try training a neural network. Open the notebook called “part2_neural_network_mnist_data”. That’s  the version of our program that is fairly basic and doesn’t do anything fancy like rotating images. Because our Raspberry Pi is much slower than a typical laptop, we’ll turn down some of parameters to reduce the amount of calculations needed, so that we can be sure the code works without wasting hours and finding that it doesn’t.

I’ve reduced the number of hidden nodes to 10, and the number of epochs to 1. I’ve still used the full MNIST training and test datasets, not the smaller subsets we created earlier. Set it running with “Run All” from the “Cell” menu. And then we wait ...

Normally this would take about one minute on my laptop, but this completed in about 25 minutes. That's not too slow at all, considering this Raspberry Pi Zero costs 400 times less than my laptop. I was expecting it to take all night.


Raspberry Pi Success!

We’ve just proven that even with a £4 or $5 Raspberry Pi Zero, you can still work fully with IPython notebooks and create code to train and test neural networks - it just runs a little slower!