## Tuesday, 28 June 2016

### Bias Nodes in Neural Networks

I've been asked about bias nodes in neural networks. What are they? Why are they useful?

### Back to Basics

Before we dive into bias nodes .. let's go back to basics. Each node in a neural network applies a threshold function to the input. The output helps us make a decision about the inputs.

We know the nodes in a real neural network are usually sigmoid in shape, with the $1/(1+e^{-x})$ logistic function and the $tanh()$ function also being popular.

But before we arrived at those, we used a very simple linear function to understand how it could be used to classify or predict, and how it could be refined  by adjusting its slope. So let's stick with linear functions for now - because they are simpler.

The following is a simple linear function.

$$y = A\cdot x$$

You'll remember it was the parameter $A$ that we varied to get different classifications. And it was this parameter $A$ that we refined by learning from the error from each training example.

The following diagram shows some examples of different lines possible with such a linear function.

You can see how some lines are better at separating the two clusters. In this case the line $y=2x$ is the best at separating the two clusters.

That's all cool and happy - and stuff we've already covered before.

### A Limitation

Look at the following digram and see which line of the form $y=A\cdot x$ would best separate the data.

Ouch! We can't seem to find a line that does the job - no matter what slope we choose.

This is a limitation we've hit. Any line of the form $y= A\cdot x$ must go through the origin. You can see in the diagram all three example lines do.

### More Freedom

What we need is to be able to shift the line up and down. We need an extra degree of freedom.

The following diagram shows some example separator lines which have been liberated from the need to go through the origin.

You can see one that does actually do a good job of separating the two data clusters.

So what form do these liberated lines take? They take the following form:

$$y = A \cdot x + B$$

We've added an extra $+B$ to the previous simpler equation $y = A\cdot x$. All this will be familiar to you if you've done maths at school.

### Bias Node

So we've just found that for some problems, a simple linear classifier of the form $y=A\cdot x$ was insufficient to represent the training data. We needed an extra degree of freedom so the lines were freer to go all over the data. The full form of a linear function $y = A\cdot x + B$ does that.

The same idea applies even when we're using sigmoid shaped functions in each neural network node. You can see that without a $+B$ those simpler functions are doomed to stick to a fixed origin point, and only their slope changes.  You can see this in the following diagram.

How do we represent this in a neural network?

We could change the activation function in each node. But remember, we chose not to alter the slope of that function, never mind adding a constant. We instead chose to change the weights of the incoming signals.

So we need to continue that approach. The way to do this is to add a special additional node into a layer, alongside the others, which always has a constant value usually set to 1. The weight of the link is able to change, and even become negative. This has the same effect of adding the additional degree of freedom that we needed above.

The following illustrates the idea:

The activation function is a sigmoid is of the combined incoming signals $w_0 + w_1\cdot x$. The $w_0$ is provided by the additional node and has the effect of shifting the function left or right along the x-axis. That in effect allows the function to escape being pinned to the "origin" which is $(0, \frac{1}{2})$ for the logistic function and $(0,0)$ for the $tanh()$.

Don't forget that the $w_1$ can be negative too ... which allows the function to flip top to bottom too, allowing for lines which fall not just rise.

The following shows how the extra node is included in a layer. That node is called a bias node.

It is worth experimenting to determine whether you need a bias node to augment the input layer, or whether you also need one to augment the internal hidden layers. Clearly you don't have one on the output layer.

### Coding A Bias Node

A bias node is simple to code. The following shows how we might add a bias node to the input layer, with code based on our examples in github.

• Make sure the weight matrix has the right shape by incrementing the number of input nodes, self.inodes = input_nodes + 1.
• This automatically means that the weight matrix takes the right shape, self.wih depends on self.inodes.
• In the query() and train() functions, the inputs_list has a 1.0 bias constant input prepended or appended to it.

### Why Didn't We Use Bias?

Why didn't we use bias when we created a neural network to learn the MNIST data set?

The primary aim of the book was to keep things as simple as possible and avoid additional details or optimisations as much as possible.

The MNIST data challenge is one that happens not to need a bias node. Just like some cluster separation problems don't need the extra degree of freedom.

Simple!

1. Thanks man, you are the best !. thanks for your great book!

1. Thanks for your feedback! If you like the book please do add a review on Amazon - it helps me invest time in my next project - a book on text mining making the subject as accessible as possible.

2. Hi Tariq, i already give 5 stars and commented. i will buy and recommend any book you write for sure!, Greetings from Colombia.

3. muchas gracias!

2. This comment has been removed by the author.

3. hi tariq. i am reading your book "make your own network". i tried to program a neural network that implements the xor-function. but it didnt work :-(
the solution was that i forgot the bias node! because in your book you never mentioned a bias node. i had to work it out from different sources. why didnt you mention it?

by the way:
In the query() and train() functions, the inputs_list has a 1.0 bias constant input prepended or appended to it
--> this doesnt work: inputs_list.append(1.0)
--> this works: inputs_list + [1.0]
because inputs_list is used in train and query and with append you append it twice (you get an error because matrix calculation doesnt work)

But i love your book. I have read several books about neural network and you are the first one who describes not only how to update weights but also describing the formula for this.

1. Hi Mehmet - thanks for taking the time to comment.

As explained above in the post, I didn't cover the bias in the guide as I wanted to keep the guide as simple as possible, and it is possible to have useful networks without bias nodes.

I can't find the code you refer to where 1.0 is appended/prepended .. I've looked and I can't see it:

https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork/blob/master/part2_neural_network_mnist_data.ipynb

4. Good day. Tell me please. I read that bias should be on every layer of the network, including the hidden one. In this case, as I understand it, it appears only on the input. How to add bias to the hidden level or is there no need?
And thanks for the wonderful book. Even with my knowledge of mathematics I was able to understand the work of the neural network.

1. Hi XpoFTT - good question. In the book we avoided the bias node idea to keep things simple. In this blog post we explained that you can't learn some functions without a bias node - and we illustrated it using a bias node added to the input layer.

To be fully general - you would have a bias node on every layer (except the output layer as that doesn't make sense).

The implementation is the same - you can add an extra node in your matrix which aways has an output of 1.0.

5. Thanks For Your valuable posting, it was very informative...
Hire Node js Developer

6. Nice post . Keep updating Artificial intelligence Online Trining

7. Thank you for sharing wonderful information with us to get some idea about that content. check it once through
Best Machine Learning Training courses | best machine learning institute in chennai | Machine Learning course in chennai

8. This comment has been removed by a blog administrator.

9. This blog is really helpful for my database. It enhanced the area of my thoughts and pushed me beyond the boundaries. Work ethic of every point is different and represent a new way to improve myself.
CISCO Cisco Meraki MR52

10. Thanks for sharing your valuable information and time.
Machine learning Training in Delhi
Machine learning Course in Delhi

11. It is a very popular application which provides you to download videos, songs, mashups, remixes, trailers in your mobile , and off course in any quality. It contains many videos and of many types .
like funny videos, motivational videos, songs lyrics, trailer and many more.
Want to watch movie in slow connection,
Vidmate app,
Vidmate app to watch videos,
vidmate,
Vidmate Features,
cm security antivirus,
leaked Movie Online,
Why do people prefer Vidmate App.

12. This is a nice Site to watch out for and we provided information on
vidmate make sure you can check it out and keep on visiting our Site.

13. Download and install Vidmate App which is the best HD video downloader software available for Android. Get free latest HD movies, songs, and your favorite TV shows.

14. This comment has been removed by the author.

16. get free apps on 9apps

19. Finally got the information which I want. Thank you so much dear for sharing this, visit Ogen Infosystem for the best Website Designing Company in Delhi.
Best Website Designing Company in India

20. Thanks for provide great informatic and looking beautiful blog, really nice required information & the things i never imagined and i would request, wright more blog and blog post like that for us. Thanks you once agianMarriage certificate in delhi