Wednesday, 7 February 2018

Saving and Loading Neural Networks

A very common question I get is how to save a neural network, and load it again later.


Why Save and Load?

There are two key scenarios when being able to save and load a neural network are useful.

  • During a long training period it is sometimes useful to stop and continue at a later time. This might be because you're using a laptop which can't remain on all the time. It could be because you want to stop the training and test how well the neural network performs. Being able to resume training at a different time is really helpful.
  • It is useful to share your trained neural network with others. Being able to save it, and for someone else to load it, is necessary for this to work.





What Do We Save?

In a neural network the thing that is doing the learning are the link weights. In our Python code, these are represented by matrices like wih and who. The wih matrix contains the weights for the links between the input and hidden layer, and the who matrix contains the weights for the links between the hidden and output layer.

If we save these matrices to a file, we can load them again later. That way we don't need to restart the training from the beginning.


Saving Numpy Arrays

The matrices wih and who are numpy arrays. Luckily the numpy library provides convenience functions for saving and load them.

The function to save a numpy array is numpy.save(filename, array). This will store array in filename. If we wanted to add a method to our neuralNetwork class, we could do it simply it like this:

# save neural network weights 
def save(self):
    numpy.save('saved_wih.npy', self.wih)
    numpy.save('saved_who.npy', self.who)
    pass

This will save the wih matrix as a file saved_wih.npy, and the wih matrix as a file saved_wih.npy.

If we want to stop the training we can issue n.save() in a notebook cell. We can then close down the notebook or even shut down the computer if we need to.


Loading Numpy Arrays

To load a numpy array we use array = numpy.load(filename). If we want to add a method to our neuralNetwork class, we should use the filenames we used to save the data.

# load neural network weights 
def load(self):
    self.wih = numpy.load('saved_wih.npy')
    self.who = numpy.load('saved_who.npy')
    pass

If we come back to our training, we need to run the notebook up to the point just before training. That   means running the Python code that sets up the neural network class, and sets the various parameters like the number of input nodes, the data source filenames, etc.

We can then issue n.load() in a notebook cell to load the previously saved neural networks weights back into the neural network object n.


Gotchas

We've kept the approach simple here, in line with our approach to learning about and coding simple neural networks. That means there are some things our very simple network saving and loading code doesn't do.

Our simple code only saves and loads the two wih and who weights matrices. It doesn't do anything else. It doesn't check that the loaded data matches the desired size of neural network. We need to make sure that if we load a saved neural network, we continue to use it with the same parameters. For example, we can't train a network, pause, and continue with different settings for the number of nodes in each layer.

If we want to share our neural network, they need to also be running the same Python code. The data we're passing them isn't rich enough to be independent of any particular neural network code. Efforts to develop such an open inter-operable data standard have started, for example the Open Neural Network Exchange Format.


HDF5 for Very Large Data

In some cases, with very large networks, the amount of data to be saved and loaded can be quite big. In my own experience from around 2016, the normal saving of bumpy arrays in this was didn't always work. I then fell back to a slightly more involved method to save and load data using the very mature HDF5 data format , popular in science and engineering.

The Anaconda Python distribution allows you to install the h5py package, which gives Python the ability to work with HDF5 data.

HDF5 data stores do more than the simple data saving and loading. They have the idea of a group or folder which can contain several data sets, such as numpy arrays. The data stores also keep account of data set names, and don't just blindly save data. For very large data sets, the data can be traverse and segmented on-disk without having to load it all into memory before subsets are taken.

You can explore more here: http://docs.h5py.org/en/latest/quick.html#quick

12 comments:

  1. For those asking, it is even possible to save both weight matrices in the same file:

    def save(self, filename):
        with open(filename, 'wb') as f:
            numpy.save(f, self.wih)
            numpy.save(f, self.who)

    def load(self, filename):
        with open(filename, 'rb') as f:
            self.wih = numpy.load(f)
            self.who = numpy.load(f)

    ReplyDelete
    Replies
    1. thanks Vincent! great to get ideas from friends more expert with Python than I am !

      Delete
  2. Hi Tariq,

    I find your blog very interesting. Thank you for sharing your insights. I would like to connect with you to discuss a potential book project. Please feel free to get in touch with me at varshas@packtpub.com. Thank you.


    ReplyDelete
    Replies
    1. Hi Varsha - ive contacted you by email.

      Delete
  3. I bought your book and followed your code. It was very interesting. BTW, would you give me hand for me to move on to CNN and RNN? I wish that I could keep writing the python codes instead of using famous frameworks. Thanks a lot.

    ReplyDelete
    Replies
    1. Hi Sungick - thanks for your kind comments.

      I haven't explored the very best resources for CNN and RNN but many have said that the following are very helpful:

      * online courses like coursera and khan academy
      * some great blogs
      * a solid textbook like Christopher Bishop's as a reference

      I also recommend you join a local group or community - learning together and from others is really helpful.

      Delete
  4. Hi Tariq,
    I just finished your book and I really like the basic approach you took to explaining everything. The basic introduction to classifiers and the calculus part were very well written. However, when I implemented wrote the code for the testing on my own handwritten numbers, no matter what "item" I test, the NN always gives the same answer. I also had to use skimage because the scipy.misc image tools are deprecated. Any ideas why this might be? I'm more than happy to send an email instead of a blogpost if you prefer.
    Thanks alot ahead of time,
    David

    ReplyDelete
    Replies
    1. Hi - if you post an issue on the github you can then also add image attachements.
      https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork

      someone else has been having issues with their own images and the discussion there might be helpful.

      Common things to check - what are the values that your image loader returns? 0-1? RGB channels or just greyscale? The MNISt dataset has ink as 255 and empty space as 0 .. the opposite of many image loading libraries.

      I deleted your comment which had your email address - I wasn't sure you wanted your email public.

      Delete
    2. Thanks. I added an issue with details and screenshots on GH as the other ones I read through did not seem to be the same one.

      Delete
  5. Hi Tariq,

    I've just finished your book. First of all, it's awesome. You did a great job of explaining difficult concepts in a simple way.

    The next step is trying to implement a NN with two hidden layers. The problem is, I think I've gotten everything right, but with 100 nodes in both layers, I actually get a decrease in performance!! Is this to be expected, or am I missing something here...!?

    Much appreciated,

    Ryan

    ReplyDelete
    Replies
    1. great question !

      It is expected - and the reason is that if you lots more capacity in your networks than is needed to solve the problem it takes much longer to train the network. This is because there are so many more combinations that the "search" has to navigate.

      This is why networks used in real applications are arrived by tryong to find the best smallest network that will solve the problem.

      I hope that helps - if it doesnt do get back to me.

      Delete
    2. Me too! The trick is to lower the learning rate and use more epochs. Think of it this way: you are pushing mush through a strainer, and now you have 2 strainers, so you have to push slower and for a much longer time. If you look at autoencoders, the number of hidden nodes drops rapidly with layer number.

      Delete