tag:blogger.com,1999:blog-6364939226059435182024-03-05T13:03:25.911-08:00Make Your Own Neural NetworkMYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comBlogger46125tag:blogger.com,1999:blog-636493922605943518.post-72596033077471303562020-03-14T17:10:00.000-07:002020-03-15T10:03:10.974-07:00Make Your First GAN With PyTorch - Is Available!<b>Make Your First GAN with PyTorch</b> is now available!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://www.amazon.com/dp/B085RNKXPD" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" data-original-height="1035" data-original-width="800" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEji0EGonJhjkVrLnhxreXsEVQT1CDtOBNbpOJaM_h8bvpdN3bx8tXKyyqbR21d5YAdEghwkkNVwL3V2a6fhRS48OBzIXYbfmPoMCocEnNx7DRTRtPmQe-gST0F_KncZIsp1UFlCVaumvn3H/s400/myo_gan_cover.png" width="308" /></a></div>
<br />
Amazon printed edition: <a href="https://www.amazon.com/dp/B085RNKXPD" target="_blank">https://www.amazon.com/dp/B085RNKXPD</a>.<br />
<br />
All code is on github: <a href="https://github.com/makeyourownneuralnetwork/gan" target="_blank">https://github.com/makeyourownneuralnetwork/gan</a><br />
<br />
Sample pages:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8HaTbBQL-QB-u16v2hoTnniYCJ-cLVRVkc3ZFUzDCL5kQ78r0g12_QpfKMAlxSXemygN842ooYP0kqDoWY9fmjgHjYJ6CWcSU7O21xFFCRrHHZdMsePaAcHf5H0Vz23BNpS37MwEDzZ6P/s1600/myfgan_samples_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1025" data-original-width="1600" height="255" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh8HaTbBQL-QB-u16v2hoTnniYCJ-cLVRVkc3ZFUzDCL5kQ78r0g12_QpfKMAlxSXemygN842ooYP0kqDoWY9fmjgHjYJ6CWcSU7O21xFFCRrHHZdMsePaAcHf5H0Vz23BNpS37MwEDzZ6P/s400/myfgan_samples_1.png" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlcwiH5THCpK3Mvn7HP0r3wfm7f6JUhx3np1Bh5ulZ8LnfEa68wRWHu7ZYeKjHiS66Z03wHu_e9Nz8bmbNrizjWIv6KjMZzy7AhfKPcT2Yfa-38Kw_rmuEI79_A_kW4PV_OQ3AsXdCckoH/s1600/myfgan_samples_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1025" data-original-width="1600" height="255" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlcwiH5THCpK3Mvn7HP0r3wfm7f6JUhx3np1Bh5ulZ8LnfEa68wRWHu7ZYeKjHiS66Z03wHu_e9Nz8bmbNrizjWIv6KjMZzy7AhfKPcT2Yfa-38Kw_rmuEI79_A_kW4PV_OQ3AsXdCckoH/s400/myfgan_samples_2.png" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWwp0qAz65GFc-1UR1oGTsFBRdVUfMzxkvSUweAaPO5xjVWgaOCfdEhq3q_WEIjrqaHs_XeSOkxFssRtg4mWC5yVAOolTUfnssyfhH6baz0ODDwfPamhL3jnxNaSXujv6DYJDOS_HCKB7m/s1600/myfgan_samples_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1025" data-original-width="1600" height="255" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiWwp0qAz65GFc-1UR1oGTsFBRdVUfMzxkvSUweAaPO5xjVWgaOCfdEhq3q_WEIjrqaHs_XeSOkxFssRtg4mWC5yVAOolTUfnssyfhH6baz0ODDwfPamhL3jnxNaSXujv6DYJDOS_HCKB7m/s400/myfgan_samples_3.png" width="400" /></a></div>
<br />
<br />MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-8413248466474040722020-03-05T14:57:00.001-08:002020-03-05T15:32:47.157-08:00Gradient Descent Unstable For GANs?When training neural networks we use <b>gradient descent</b> to find a path down a loss function to find the combination of learnable parameters that minimise the error. This is a very well researched area and techniques today are very sophisticated, the Adam optimiser being a good example.<br />
<br />
The dynamics of a GAN are different to a simple neural network. The generator and discriminator networks are trying to achieve opposing objectives. There are parallels between a GAN and adversarial games where one player is trying to maximise an objective while the other is trying to minimise it, each undoing the benefit of the opponent’s previous move.<br />
<br />
Is the gradient descent method of finding the correct, or even good enough, combination of learnable parameters suitable for such adversarial games? This might seem like an unnecessary question, but the answer is rather interesting.<br />
<br />
<br />
<h3>
Simple Adversarial Example</h3>
<br />
The following is a very simple objective function:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrsIsIVhMaDFPNmEHclWxbhgIWt6LU_m-OZRueW6brlCO1jIK0K2OFPdmdHYOGL0Xeh_NC7qUfNFsMW5kiZE9yDBVGf0HYao3PwUY-D_uhrQ8GLD-GmLfjdHawyM9Z4Oa0CLSFBUwMaCG7/s1600/appendix_D_graddescent_0.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="163" data-original-width="1600" height="64" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrsIsIVhMaDFPNmEHclWxbhgIWt6LU_m-OZRueW6brlCO1jIK0K2OFPdmdHYOGL0Xeh_NC7qUfNFsMW5kiZE9yDBVGf0HYao3PwUY-D_uhrQ8GLD-GmLfjdHawyM9Z4Oa0CLSFBUwMaCG7/s640/appendix_D_graddescent_0.png" width="640" /></a></div>
<br />
One player has control over the values of <b>x</b> and is trying to maximise the objective <b>f</b>. A second player has control over <b>y</b> and is trying to minimise the objective <b>f</b>.<br />
<br />
Let’s visualise this function to get a feel for it. The following picture shows a surface plot of <b>f = x·y</b> from three slightly different angles.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEie9Uop1089iLTkzGxKWRoeciIoFkc19lDBgSeup3WLjDFDjA2XHXJWCCjWjgvHENXNWZPLR637fC50DFuhtCBLaFYk4eZ2lTqwni6CtpfblVE7-stiRzh4kOsjS9mXeXoO1-mZow7Rasu0/s1600/appendix_D_xy.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="640" data-original-width="1600" height="252" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEie9Uop1089iLTkzGxKWRoeciIoFkc19lDBgSeup3WLjDFDjA2XHXJWCCjWjgvHENXNWZPLR637fC50DFuhtCBLaFYk4eZ2lTqwni6CtpfblVE7-stiRzh4kOsjS9mXeXoO1-mZow7Rasu0/s640/appendix_D_xy.png" width="640" /></a></div>
<br />
We can see that the surface of <b>f = x·y</b> is a <b>saddle</b>. That means, along one direction the values rise then fall, but in another direction, the values fall then rise.<br />
<br />
The following picture shows the same function from above, using colours to indicate the values of <b>f</b>. Also marked are the directions of increasing gradient.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfGHXBQbt4PfcbqbNjK4agCWeC52fBWK7gziY9PMHBnPuXbAlMsZoNOaLldEsvdYpBf8mgBK-7raSJLge8KaWNkbJBPLKisDQG1m7VbtevwR_HVjcnywyehcae-q7FtKYFvSQxjP8XvuhS/s1600/appendix_d_xy_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="640" data-original-width="1600" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfGHXBQbt4PfcbqbNjK4agCWeC52fBWK7gziY9PMHBnPuXbAlMsZoNOaLldEsvdYpBf8mgBK-7raSJLge8KaWNkbJBPLKisDQG1m7VbtevwR_HVjcnywyehcae-q7FtKYFvSQxjP8XvuhS/s640/appendix_d_xy_2.png" width="640" /></a></div>
<br />
If we used our intuition to find a good solution to this adversarial game, we would probably say the best answer is the middle of that saddle at <b>(x,y) = (0,0)</b>. At this point, if one player sets <b>x = 0</b>, the second player can’t affect the the value of <b>f</b> no matter what value of y is chosen. The same applies if <b>y = 0</b>, no value of <b>x</b> can change the value of <b>f</b>. The actual value of f at this point is also the best compromise. Elsewhere there are as many higher values of f as there are lower, so it seems like a good compromise.<br />
<br />
You can explore the surface interactively yourself using the <b>math3d.org</b> website:<br />
<br />
<ul>
<li><a href="https://www.math3d.org/wz85eIlP%C2%A0" target="_blank">https://www.math3d.org/wz85eIlP </a></li>
<li><a href="https://www.math3d.org/x6xNjkaR" target="_blank">https://www.math3d.org/x6xNjkaR</a></li>
</ul>
<br />
<br />
Let’s now move away from intuition and work out the answer by simulating both players using gradient descent, each trying to find a good solution for themselves.<br />
<br />
You’ll remember from <i>Make Your Own Neural Network</i> that parameters are adjusted by a small amount that depends on the gradient of the objective function.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglItC5Qb4XP6Y5TGczdohmBftolhQWFFS27GZDi3mPu5GWtfX5tFb_3l0iUs8js_0MCqwLyDyp7TFpyI0BDA8EzI2Ej75WlFAuQCC0Qbr1Iy-aN9uppnbtmYxF411DG33wj4pRGWlublo8/s1600/appendix_D_graddescent_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="272" data-original-width="1600" height="108" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglItC5Qb4XP6Y5TGczdohmBftolhQWFFS27GZDi3mPu5GWtfX5tFb_3l0iUs8js_0MCqwLyDyp7TFpyI0BDA8EzI2Ej75WlFAuQCC0Qbr1Iy-aN9uppnbtmYxF411DG33wj4pRGWlublo8/s640/appendix_D_graddescent_1.png" width="640" /></a></div>
<br />
The reason we have different signs in these <b>update rules</b> is that <b>y</b> is trying to minimise <b>f</b> by moving down the gradient, but <b>x</b> is trying to maximise <b>f</b> by moving up the gradient. That lr is the usual learning rate.<br />
<br />
Because we know <b>f = x·y</b> we can write those update rules with the gradients worked out.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9PIQvhh5vEx43Cinv6Rk-oUz6aKOiklcYvjf4B5SGLSBPvfPPoNpvv45sivUtS_yZfkmEfHa2pof-uj0C_ik7gGNGezY2uiyE_aAv6svgGYmyegk7IWbia4U7Fj7XeFtRGI9oT8-6ApxA/s1600/appendix_D_graddescent_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="272" data-original-width="1600" height="108" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9PIQvhh5vEx43Cinv6Rk-oUz6aKOiklcYvjf4B5SGLSBPvfPPoNpvv45sivUtS_yZfkmEfHa2pof-uj0C_ik7gGNGezY2uiyE_aAv6svgGYmyegk7IWbia4U7Fj7XeFtRGI9oT8-6ApxA/s640/appendix_D_graddescent_2.png" width="640" /></a></div>
<br />
We can write some code to pick starting values for <b>x</b> and <b>y</b>, and then repeatedly apply these update rules to get successive <b>x</b> and <b>y</b> values.<br />
<br />
The following shows how <b>x</b> and <b>y</b> evolve as training progresses.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXZHkh4SO78JEG8lSqPcs9c_eqKeJCt6BObYMX5tQa-klVNn3Fe3kzjQPEaIxGr7J2L7-fKS9Vyk0tP5TTinGhbxPUGbbJ7vFmipqLqyIsFpbLsYK552wculSXa7fFqlEm-rexEkaOjRnr/s1600/appD_unstable_gradient_descent_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="640" data-original-width="1600" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXZHkh4SO78JEG8lSqPcs9c_eqKeJCt6BObYMX5tQa-klVNn3Fe3kzjQPEaIxGr7J2L7-fKS9Vyk0tP5TTinGhbxPUGbbJ7vFmipqLqyIsFpbLsYK552wculSXa7fFqlEm-rexEkaOjRnr/s640/appD_unstable_gradient_descent_1.png" width="640" /></a></div>
<br />
We can see that the values of <b>x</b> and <b>y</b> don’t converge, but oscillate with ever greater amplitude. Trying different starting values leads to the same behaviour. Reducing the learning rate merely delays the inevitable <b>divergence</b>.<br />
<br />
This is bad. It shows that gradient descent can’t find a good solution to this simple adversarial game, and even worse, the method leads to disastrous divergence.<br />
<br />
The following picture shows <b>x</b> and <b>y</b> plotted together. We can see the values orbit around the ideal point <b>(0,0)</b> but run away from it.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9otsDo8uNmWQMFhOtPyrAZGyX_NdJZZR8J3YFDWhSQbsDIGvUD2ouVzLKlblG1Zg_kTmhdFiwyt7nnXrCMoGBr1C0BEgjd4y2AeZj2tlg4-CEa-1AnzEOZJ9jJIIiJ8Ds3fP52BQbAq4Y/s1600/appD_unstable_gradient_descent_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="640" data-original-width="1600" height="254" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9otsDo8uNmWQMFhOtPyrAZGyX_NdJZZR8J3YFDWhSQbsDIGvUD2ouVzLKlblG1Zg_kTmhdFiwyt7nnXrCMoGBr1C0BEgjd4y2AeZj2tlg4-CEa-1AnzEOZJ9jJIIiJ8Ds3fP52BQbAq4Y/s640/appD_unstable_gradient_descent_2.png" width="640" /></a></div>
<br />
It can be shown mathematically (see below) that the best case scenario is that <b>(x,y)</b> orbits in a fixed circle around the <b>(0,0)</b> without getting closer to it, but this is only when the update step is infinitesimally small. As soon we have a finite step size, as we do when approximate that continuous process in discrete steps, the orbit diverges.<br />
<br />
You can explore the code which plays this adversarial game using gradient descent here:<br />
<br />
<br />
<ul>
<li><a href="https://github.com/makeyourownneuralnetwork/gan/blob/master/Appendix_D_convergence.ipynb">https://github.com/makeyourownneuralnetwork/gan/blob/master/Appendix_D_convergence.ipynb</a></li>
</ul>
<br />
<br />
<h3>
Gradient Descent Isn’t Ideal For Adversarial Games</h3>
<br />
We’ve shown that gradient descent fails to find a solution to an adversarial game with a very simple objective function. In fact, it doesn’t just fail to find a solution, it catastrophically diverges. In contrast, gradient descent used in the normal way to minimise a function is guaranteed to find a minimum, even if it isn’t the global minimum.<br />
<br />
Does this mean GAN training will fail in general? No.<br />
<br />
Realistic GANs with meaningful data will have much more complex loss functions, and that can reduce the chances of runaway divergence. That’s why GAN training throughout this book has worked fairly well. But this analysis does indicate why training GANs is hard, and can become chaotic. Orbiting around a good solution might also explain why some GANs seem to progress onto different modes of single-mode collapse with extended training rather than improving the quality of images themselves.<br />
<br />
Fundamentally, gradient descent is the wrong approach for GANs, even if it works well enough in many cases. Finding optimisation techniques designed for adversarial dynamics like those in GANs is currently an open research question, with some researchers already publishing encouraging results.<br />
<br />
<br />
<h3>
Why A Circular Orbit?</h3>
<br />
Above we stated that <b>(x,y)</b> orbits as a circle when two players each use gradient descent to optimise <b>f = x·y</b> in opposite directions. Here we’ll do the maths to show why it is a circle.<br />
<br />
Let’s look at the update rules again.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9PIQvhh5vEx43Cinv6Rk-oUz6aKOiklcYvjf4B5SGLSBPvfPPoNpvv45sivUtS_yZfkmEfHa2pof-uj0C_ik7gGNGezY2uiyE_aAv6svgGYmyegk7IWbia4U7Fj7XeFtRGI9oT8-6ApxA/s1600/appendix_D_graddescent_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="272" data-original-width="1600" height="108" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9PIQvhh5vEx43Cinv6Rk-oUz6aKOiklcYvjf4B5SGLSBPvfPPoNpvv45sivUtS_yZfkmEfHa2pof-uj0C_ik7gGNGezY2uiyE_aAv6svgGYmyegk7IWbia4U7Fj7XeFtRGI9oT8-6ApxA/s640/appendix_D_graddescent_2.png" width="640" /></a></div>
<br />
If we want to know how <b>x</b> and <b>y</b> evolve over time <b>t</b>, we can write:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjP23TlNjak8ezqY1UQVUrI3i4Deniimughvx43hhQ9Q0H2lZozhVTlhYjbtLqx4EzDUawhCbgADQpZdp0dXvOJyeiYm-xUCpSk7GkOMtMz2wFZz_ZmT12ZhRI94wkJfXYSh-GFKRkbC8T-/s1600/appendix_D_graddescent_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="272" data-original-width="1600" height="108" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjP23TlNjak8ezqY1UQVUrI3i4Deniimughvx43hhQ9Q0H2lZozhVTlhYjbtLqx4EzDUawhCbgADQpZdp0dXvOJyeiYm-xUCpSk7GkOMtMz2wFZz_ZmT12ZhRI94wkJfXYSh-GFKRkbC8T-/s640/appendix_D_graddescent_3.png" width="640" /></a></div>
<br />
If we take the second derivatives with respect to <b>t</b>, we get the following.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2blN9fSWE9zBAxZQFbtamYCZYFBpfY1oi_SNwkCe06lz0jfdrKL0KQwAvCv1dMQ89WwKRXfDzVcq9PkiwrVEl6JmU81mgP46I0Ys80VlYgLjV2W7-GVwVGytFRChtew9ST3pxSrjAiLzR/s1600/appendix_D_graddescent_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="272" data-original-width="1600" height="108" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2blN9fSWE9zBAxZQFbtamYCZYFBpfY1oi_SNwkCe06lz0jfdrKL0KQwAvCv1dMQ89WwKRXfDzVcq9PkiwrVEl6JmU81mgP46I0Ys80VlYgLjV2W7-GVwVGytFRChtew9ST3pxSrjAiLzR/s640/appendix_D_graddescent_4.png" width="640" /></a></div>
<br />
You may remember from school algebra that expressions of the form <b>d<sup>2</sup>y/dt<sup>2</sup> = - a<sup>2</sup>x</b> have a solution the form <b>y = sin(at)</b> or <b>y = cos(at)</b>. To satisfy the first derivatives above, we need <b>x</b> and <b>y</b> to be the following combination.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGpUBIqFIAeGf6Q_QM58uBISzZBnlHXGIDK9lruueYwjFu4RRrn_A4s-LMbzWQ8AAKWhCSCSOqejHnVxdsPBPrh2E2VW7IwfMhRrW24YQ0zW6xFrubrbmz8WXbIOmf1quZHlK64i86oLNR/s1600/appendix_D_graddescent_5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="272" data-original-width="1600" height="108" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGpUBIqFIAeGf6Q_QM58uBISzZBnlHXGIDK9lruueYwjFu4RRrn_A4s-LMbzWQ8AAKWhCSCSOqejHnVxdsPBPrh2E2VW7IwfMhRrW24YQ0zW6xFrubrbmz8WXbIOmf1quZHlK64i86oLNR/s640/appendix_D_graddescent_5.png" width="640" /></a></div>
<br />
These describe <b>(x,y)</b> moving around a unit circle with angular speed <b>lr</b>.MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-5013487146449903922020-02-17T09:28:00.000-08:002020-02-17T09:29:08.204-08:00Calculating the Output Size of Convolutions and Transpose Convolutions<b><a href="https://en.wikipedia.org/wiki/Convolution" target="_blank">Convolution</a></b> is common in neural networks which work with images, either as classifiers or as generators. When designing such convolutional neural networks, the shape of data emerging from each convolution layer needs to be worked out.<br />
<br />
Here we’ll see how this can be done step-by-step with configurations of convolution that we’re likely to see working with images.<br />
<br />
In particular, <b>transposed convolutions</b> are thought of as difficult to grasp. Here we’ll show that they’re not difficult at all by working though some examples which all follow a very simple recipe.<br />
<br />
<br />
<h3>
Example 1: Convolution With Stride 1, No Padding</h3>
<br />
In this first simple example we apply a <b>2 by 2</b> <a href="https://en.wikipedia.org/wiki/Kernel_(image_processing)" target="_blank">kernel</a> to an input of size <b>6 by 6</b>, with stride <b>1</b>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgujCJxrntmP8HYQJ_3hQlox0lnEEwoEgwYI553LLUfoCxyoI3eDMfkb_ldh4qmkvc2qZWg6eYCbvGciB70vj78DOVcVhzW-MARIR_0JGikIq1zDqGYe9x7K4J_xGoaHCkGF_bSK9BrpOlC/s1600/appendix_C_eg_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="640" data-original-width="1600" height="252" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgujCJxrntmP8HYQJ_3hQlox0lnEEwoEgwYI553LLUfoCxyoI3eDMfkb_ldh4qmkvc2qZWg6eYCbvGciB70vj78DOVcVhzW-MARIR_0JGikIq1zDqGYe9x7K4J_xGoaHCkGF_bSK9BrpOlC/s640/appendix_C_eg_1.png" width="640" /></a></div>
<br />
The picture shows how the kernel moves along the image in steps of size <b>1</b>. The areas covered by the kernel do overlap but this is not a problem. Across the top of the image, the kernel can take <b>5</b> positions, which is why the output is <b>5</b> wide. Down the image, the kernel can also take <b>5</b> positions, which is why the output is a <b>5 by 5</b> square. Easy!<br />
<br />
The PyTorch function for this convolution is:<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace;"><b>nn.Conv2d(in_channels, out_channels, kernel_size=2, stride=1)</b></span><br />
<br />
<br />
<h3>
Example 2: Convolution With Stride 2, No Padding</h3>
<br />
This second example is the same as the previous one, but we now have a stride of <b>2</b>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBqMNMMzZlS0MEeFv3OqnF7B9bK-qBVdUZl_5njUGD7spu0hVodqgvnmO4DwvybvPpFs_UIcQQuDaYp69aAdb277O7PpIsJkNwy6igH63GJESUJdvS6n9GudAQEUo2QF5QufilcBF1vH-E/s1600/appendix_C_eg_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="640" data-original-width="1600" height="252" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBqMNMMzZlS0MEeFv3OqnF7B9bK-qBVdUZl_5njUGD7spu0hVodqgvnmO4DwvybvPpFs_UIcQQuDaYp69aAdb277O7PpIsJkNwy6igH63GJESUJdvS6n9GudAQEUo2QF5QufilcBF1vH-E/s640/appendix_C_eg_2.png" width="640" /></a></div>
<br />
We can see the kernel moves along the image in steps of size <b>2</b>. This time the areas covered by the kernel don’t overlap. In fact, because the kernel size is the same as the stride, the image is covered without overlaps or gaps. The kernel can take <b>3</b> positions across and down the image, so the output is <b>3 by 3</b>.<br />
<br />
The PyTorch function for this convolution is:<br />
<br />
<span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b style="background-color: #fff2cc;">nn.Conv2d(in_channels, out_channels, kernel_size=2, stride=2)</b></span><br />
<br />
<br />
<h3>
Example 3: Convolution With Stride 2, With Padding</h3>
<br />
This third example is the same as the previous one, but this time we use a padding of <b>1</b>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGtCE8m8nk1pVqrurYfamyHqGCCKvG3CqB4JYdW54lgT-lfgjeix2Bg1Z2TxpLxjlRaaMgaPHY3itf8ZPEMqxpFaegFooYhA3mnC8RW7dcJWb4CqQ60IPtVPBk7fEX67JgiRN1fPnRgPyF/s1600/appendix_C_eg_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="640" data-original-width="1600" height="252" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGtCE8m8nk1pVqrurYfamyHqGCCKvG3CqB4JYdW54lgT-lfgjeix2Bg1Z2TxpLxjlRaaMgaPHY3itf8ZPEMqxpFaegFooYhA3mnC8RW7dcJWb4CqQ60IPtVPBk7fEX67JgiRN1fPnRgPyF/s640/appendix_C_eg_3.png" width="640" /></a></div>
<br />
By setting padding to <b>1</b>, we extend all the image edges by <b>1</b> pixel, with values set to <b>0</b>. That means the image width has grown by <b>2</b>. We apply the kernel to this extended image. The picture shows the kernel can take <b>4</b> positions across the image. This is why the output is <b>4 by 4</b>.<br />
<br />
The PyTorch function for this convolution is:<br />
<br />
<span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b style="background-color: #fff2cc;">nn.Conv2d(in_channels, out_channels, kernel_size=2, stride=2, padding=2)</b></span><br />
<br />
<br />
<h3>
Example 4: Convolution With Coverage Gaps</h3>
<br />
This example illustrates the case where the chosen kernel size and stride mean it doesn’t reach the end of the image.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGJPuooFJ2-XC-crPzBVMTeL9jK5yZC073UCVOfkP9RCg2D3RLMbTrzcyTY9Cm0kA3cshCg-N5EnG-XYgO0ToFpmfwn8XOIcdSWHIBafWzSZmntpi5vAn4-TEMVEdwBxDdhHeJR1VPvzbG/s1600/appendix_C_eg_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="640" data-original-width="1600" height="252" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGJPuooFJ2-XC-crPzBVMTeL9jK5yZC073UCVOfkP9RCg2D3RLMbTrzcyTY9Cm0kA3cshCg-N5EnG-XYgO0ToFpmfwn8XOIcdSWHIBafWzSZmntpi5vAn4-TEMVEdwBxDdhHeJR1VPvzbG/s640/appendix_C_eg_4.png" width="640" /></a></div>
<br />
Here, the <b>2 by 2</b> kernel moves with a step size of <b>2</b> over the <b>5 by 5</b> image. The last column of the image is not covered by the kernel.<br />
<br />
The easiest thing to do is to just ignore the uncovered column, and this is in fact the approach taken by many implementations, including PyTorch. That’s why the output is <b>2 by 2</b>.<br />
<br />
For medium to large images, the loss of information from the very edge of the image is rarely a problem as the meaningful content is usually in the middle of the image. Even if it wasn’t, the fraction of information lost is very small.<br />
<br />
If we really wanted to avoid any information being lost, we’d adjust some of the option. We could add a padding to ensure no part of the input image was missed, or we could adjust the kernel and stride sizes so they matches the image size.<br />
<br />
<br />
<h3>
Example 5: Transpose Convolution With Stride 2, No Padding</h3>
<br />
The transpose convolution is commonly used to expand a tensor to a larger tensor. This is the opposite of a normal convolution which is used to reduce a tensor to a smaller tensor.<br />
<br />
In this example we use a <b>2 by 2</b> kernel again, set to stride <b>2</b>, applied to a <b>3 by 3</b> input.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGKShf5RPVMntFYtLQhNfI4wVb4l7-4wzfSbOW4YOIpsX2RyTFrPZgbaIFfZGdbMJmiPgNNI4SFLkoE2YC4PZFOfdWVgLL5N97K6JYDcKfBx4v3sLmO6zwRcbg2vcb2RCblNeLQmebC-Mm/s1600/appendix_C_eg_5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="640" data-original-width="1600" height="252" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGKShf5RPVMntFYtLQhNfI4wVb4l7-4wzfSbOW4YOIpsX2RyTFrPZgbaIFfZGdbMJmiPgNNI4SFLkoE2YC4PZFOfdWVgLL5N97K6JYDcKfBx4v3sLmO6zwRcbg2vcb2RCblNeLQmebC-Mm/s640/appendix_C_eg_5.png" width="640" /></a></div>
<br />
The process for transposed convolution has a few extra steps but is not complicated.<br />
<br />
First we create an intermediate grid which has the original input’s cells spaced apart with a step size set to the stride. In the picture above, we can see the pink cells spaced apart with a step size of <b>2</b>. The new in-between cells have value <b>0</b>.<br />
<br />
Next we extend the edges of the intermediate image with additional cells with value <b>0</b>. We add the maximum amount of these so that a kernel in the top left covers one of the original cells. This is shown in the picture at the top left of the intermediate grid. If we added another ring of cells, the kernel would no longer cover the original pink cell.<br />
<br />
Finally, the kernel is moved across this intermediate grid in step sizes of <b>1</b>. This step size is always <b>1</b>. The stride option is used to set how far apart the original cells are in the intermediate grid. Unlike normal convolution, here the stride is not used to decide how the kernel moves.<br />
<br />
The kernel moving across this <b>7 by 7</b> intermediate grid gives us an output of <b>6 by 6</b>.<br />
<br />
Notice how this transformation of a <b>3 by 3</b> input to a <b>6 by 6</b> output is the opposite of <b>Example 2 </b>which transformed an input of size <b>6 by 6</b> to an output of size <b>3 by 3</b>, using the same kernel size and stride options.<br />
<br />
The PyTorch function for this transpose convolution is:<br />
<br />
<span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b style="background-color: #fff2cc;">nn.ConvTranspose2d(in_channels, out_channels, kernel_size=2, stride=2)</b></span><br />
<br />
<br />
<h3>
Example 6: Transpose Convolution With Stride 1, No Padding</h3>
<br />
In the previous example we used a stride of <b>2</b> because it is easier to see how it is used in the process. In this example we use a stride of <b>1</b>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-Pj-Oyr-SanfdBsD3J7dTfT1zkFUFD0G4Be8wWxEelPlkk7Ej5JUZzDbPzR9NvExKbZ3vZVaL9hYB96FbUR8eo5DhFnG3Tgmlx4BgFQrV-MZZGjy7ttiA87kQrmbzsbo5zWhAzhfLHQGf/s1600/appendix_C_eg_6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="640" data-original-width="1600" height="252" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-Pj-Oyr-SanfdBsD3J7dTfT1zkFUFD0G4Be8wWxEelPlkk7Ej5JUZzDbPzR9NvExKbZ3vZVaL9hYB96FbUR8eo5DhFnG3Tgmlx4BgFQrV-MZZGjy7ttiA87kQrmbzsbo5zWhAzhfLHQGf/s640/appendix_C_eg_6.png" width="640" /></a></div>
<br />
The process is exactly the same. Because the stride is <b>1</b>, the original cells are spaced apart without a gap in the intermediate grid. We then grow the intermediate grid with the maximum number of additional outer rings so that a kernel in the top left can still cover one of the original cells. We then move the kernel with step size 1 over this intermediate <b>7 by 7</b> grid to give an output of size <b>6 by 6</b>.<br />
<br />
You’ll notice this is the opposite transformation to <b>Example 1</b>.<br />
<br />
The PyTorch function for this transpose convolution is:<br />
<br />
<span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b style="background-color: #fff2cc;">nn.ConvTranspose2d(in_channels, out_channels, kernel_size=2, stride=1)</b></span><br />
<br />
<br />
<h3>
Example 7: Transpose Convolution With Stride 2, With Padding</h3>
<br />
In this transpose convolution example we introduce padding. Unlike the normal convolution where padding is used to expand the image, here it is used to reduce it.<br />
<br />
We have a <b>2 by 2</b> kernel with stride set to <b>2</b>, and an input of size <b>3 by 3</b>, and we have set padding to <b>1</b>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlOW4vj2cuTfKphPzGendcyV7h_cJM5Qfe6GurW9IJRl_D2gLo6uZZkBmVEAJXcm3hV25Eo4NqVcHoT7JWF6NSkXo56kSQYck2RMVjzy6TSh16hEN_3TT-R8EnG_KbxPQR3LTwZZth7lWM/s1600/appendix_C_eg_7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="640" data-original-width="1600" height="252" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhlOW4vj2cuTfKphPzGendcyV7h_cJM5Qfe6GurW9IJRl_D2gLo6uZZkBmVEAJXcm3hV25Eo4NqVcHoT7JWF6NSkXo56kSQYck2RMVjzy6TSh16hEN_3TT-R8EnG_KbxPQR3LTwZZth7lWM/s640/appendix_C_eg_7.png" width="640" /></a></div>
<br />
We create the intermediate grid just as we did in <b>Example 5</b>. The original cells are spaced <b>2</b> apart, and the grid is expanded so that the kernel can cover one of the original values.<br />
<br />
The padding is set to <b>1</b>, so we remove <b>1</b> ring from around the grid. This leaves the grid at size <b>5 by 5</b>. Applying the kernel to this grid gives us an output of size <b>4 by 4</b>.<br />
<br />
The PyTorch function for this transpose convolution is:<br />
<br />
<b><span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace;">nn.ConvTranspose2d(in_channels, out_channels, kernel_size=2, stride=2, padding=1)</span></b><br />
<br />
<br />
<h3>
Calculating Output Sizes</h3>
<br />
Assuming we’re working with square shaped input, with equal width and height, the formula for calculating the output size for a convolution is:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggC63mVjjnZ4W_mYd-Yap7NMkvw45gx_YWdj-xD4SenaT70PqOA7P2v0A9jKF_NZ2bl6bDmHjJ1Swx4gw7QSAmGIXFtrLANVRbqaWP-DDaVuLP5QWtAvGLjGQLcY-WL6m0Wk4u04DKPUkB/s1600/appendix_C_conv_formula.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="326" data-original-width="1600" height="129" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggC63mVjjnZ4W_mYd-Yap7NMkvw45gx_YWdj-xD4SenaT70PqOA7P2v0A9jKF_NZ2bl6bDmHjJ1Swx4gw7QSAmGIXFtrLANVRbqaWP-DDaVuLP5QWtAvGLjGQLcY-WL6m0Wk4u04DKPUkB/s640/appendix_C_conv_formula.png" width="640" /></a></div>
<br />
The L-shaped brackets take the mathematical floor of the value inside them. That means the largest integer below or equal to the given value. For example, the floor of <b>2.3</b> is <b>2</b>.<br />
<br />
If we use this formula for <b>Example 3,</b> we have <b>input size = 6</b>, <b>padding = 1</b>, <b>kernel size = 2</b>. The calculation inside the floor brackets is <b>(6 + 2 - 1 -1) /2 + 1</b>, which is <b>4</b>. The floor of <b>4</b> remains <b>4</b>, which is the size of the output.<br />
<br />
Again, assuming square shaped tensors, the formula for transposed convolution is:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxPdYEcNw-9oM1fQMB3oW27mxz76Sjw4G_2JrK2jlxI1Bf9dTcaZErB0_wWYbw2E5tjSQgbt5IRogSuedolrNkGIIDaK5my3IjeoTT8m1rBLO85NwTVxmdXn4G9IkYqWGHGAMoHpH0Af3e/s1600/appendix_C_transposeconv_formula.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="217" data-original-width="1600" height="84" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxPdYEcNw-9oM1fQMB3oW27mxz76Sjw4G_2JrK2jlxI1Bf9dTcaZErB0_wWYbw2E5tjSQgbt5IRogSuedolrNkGIIDaK5my3IjeoTT8m1rBLO85NwTVxmdXn4G9IkYqWGHGAMoHpH0Af3e/s640/appendix_C_transposeconv_formula.png" width="640" /></a></div>
<br />
Let’s try this with <b>Example 7</b>, where the <b>input size = 3</b>, <b>stride = 2</b>, <b>padding = 1</b>, <b>kernel size = 2</b>. The calculation is then simply <b>2*2 - 2 + 1 + 1 = 4</b>, so the output is of size <b>4</b>.<br />
<br />
On the PyTorch references pages you can read about more general formulae, which can work with rectangular tensors and also additional configuration options we’ve not needed here.<br />
<ul>
<li><b>nn.Conv2d</b> <a href="https://pytorch.org/docs/stable/nn.html#conv2d" target="_blank">https://pytorch.org/docs/stable/nn.html#conv2d</a></li>
</ul>
<ul>
<li><b>nn.ConvTranspose2d</b> <a href="https://pytorch.org/docs/stable/nn.html#convtranspose2d" target="_blank">https://pytorch.org/docs/stable/nn.html#convtranspose2d</a></li>
</ul>
<br />
<br />
<h3>
More Reading</h3>
<br />
<ul>
<li>Convolutional neural networks: <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network" target="_blank">https://en.wikipedia.org/wiki/Convolutional_neural_network</a></li>
<li>Convolutions in image classification and generation: <a href="http://makeyourownalgorithmicart.blogspot.com/2019/06/generative-adversarial-networks-part-iv.html" target="_blank">http://makeyourownalgorithmicart.blogspot.com/2019/06/generative-adversarial-networks-part-iv.html</a></li>
</ul>
<br />
<br />MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-63370937606097830072018-09-12T04:51:00.001-07:002018-09-16T09:20:51.122-07:00Application of Neural Networks - Satellite Measurement of Water WavesIt's always great to see interesting uses of machine learning methods - and especially satisfying to see someone inspired by my book to apply the methods.<br />
<br />
I was privileged to have an initial discussion with Dennis when he was planning on applying neural networks to the task of classifying water waveforms measured by radar from a satellite orbiting the Earth.<br />
<br />
He went on to succeed and presented his work at a well respected <a href="http://themnet.gis.uni-stuttgart.de/achievements/workshops/#entry3">conference</a>. You can see his presentation slides here:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://goo.gl/uoX9wy" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" data-original-height="897" data-original-width="1600" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8-Pfoyhp_aOz_42-wTwAzgE_BcvuQCgHE0NdFS1grCXxKbZZTIGWogNLlKDR3dEFKPGcRWVTwZI_rxkxBcCnE7U0Vuem35d5pXHawKxGrwGAZKl9WLs5G97q6dJHW7LCAyEmcJaA-dZBJ/s400/waves.png" width="400" /></a></div>
<br />
<br />
<h3>
Altimetry</h3>
Satellite radar is used to measure the altitude (height) of surface features - which can be both land and water.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpHB9v-HVkirGgv2wozsw5wy4s6wJfrH1Sm3kZJBCN0MQpbOqrI4oUPQFEc-7kq-wC_iucnu9FQwp3MEbcXhxAX8K73DWY5zw80ycy-vb1ZDT1BiWtYBNzj7PuAcSniAIH4isZrKxD0rnG/s1600/Screen+Shot+2018-09-12+at+12.04.55.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="896" data-original-width="1600" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpHB9v-HVkirGgv2wozsw5wy4s6wJfrH1Sm3kZJBCN0MQpbOqrI4oUPQFEc-7kq-wC_iucnu9FQwp3MEbcXhxAX8K73DWY5zw80ycy-vb1ZDT1BiWtYBNzj7PuAcSniAIH4isZrKxD0rnG/s400/Screen+Shot+2018-09-12+at+12.04.55.png" width="400" /></a></div>
<br />
The signal needs to be interpreted and so that:<br />
<br />
<ul>
<li>we can establish if the surface is land or water</li>
<li>and if water, calculate the height of the water waves from the non-trivial signal pattern</li>
</ul>
<br />
<br />
<br />
<h3>
Land or Water?</h3>
A neural network was trained to determine whether the signal was from land or water.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzLikjVtjrdAXPSJ0T5x3FqCKbzz0vb701AKSWNNDIg7bTfH1_yTmSjDOD5Y4qqkBKEE8bRUQO_fZ386LhYQoS9kZ0sV94LvaGm_tf4JThxwMZo5EylHJqHZWR2yt4_ld72EE4Nw2SqDCf/s1600/Screen+Shot+2018-09-12+at+12.14.48.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="897" data-original-width="1600" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzLikjVtjrdAXPSJ0T5x3FqCKbzz0vb701AKSWNNDIg7bTfH1_yTmSjDOD5Y4qqkBKEE8bRUQO_fZ386LhYQoS9kZ0sV94LvaGm_tf4JThxwMZo5EylHJqHZWR2yt4_ld72EE4Nw2SqDCf/s400/Screen+Shot+2018-09-12+at+12.14.48.png" width="400" /></a></div>
<br />
As you can see from the slide above, the signal signature is very different.<br />
<br />
A neural network was very successful in detecting water. Detecting land was a little more challenging but this initial work showed great promise.<br />
<br />
<br />
<h3>
Water Wave Height</h3>
The next step is to calculate the height of the water waves. In-situ measurements were used as reference data to train a different neural network.<br />
<br />
Part of the challenge for a neural network is that there are several peaks that can be detected during a measurement, and we want the highest peak of a wave.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9GNbyvP9-0flVl_-GuCgNCEb1CYkdZnd3IF9tzOlyAEiwgA1xAMFTUbDyxXojXGPDGERG-aeNy52Nbn8ROLTkQaDIkf1DceOijIeyPU8qJfnM1R8d_OhRqO9H_DxC9ceZPv3fwCLbavv8/s1600/Screen+Shot+2018-09-12+at+12.38.29.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="900" data-original-width="1600" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9GNbyvP9-0flVl_-GuCgNCEb1CYkdZnd3IF9tzOlyAEiwgA1xAMFTUbDyxXojXGPDGERG-aeNy52Nbn8ROLTkQaDIkf1DceOijIeyPU8qJfnM1R8d_OhRqO9H_DxC9ceZPv3fwCLbavv8/s400/Screen+Shot+2018-09-12+at+12.38.29.png" width="400" /></a></div>
<br />
Tracking a peak as it moves allows us to have a higher level of confidence in labelling it a water wave peak.<br />
<br />
<br />
<h3>
Results</h3>
The results are promising with some areas identified for further work.<br />
<br />
The following shows how good the calculated water wave heights are based on automatic analysis by neural networks.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6v3Fbcwp-ThSqPpG5d560k0qBQXUrsvTD7ymH3Se3riKIXMXRu8_6Jaa8zgkomkaYhzkPA15h6OlrlvLFYNvZhNehWOyfmPJBxZJmRqJCdvwGNkfXIYs-lAxlf8MC4nJgOfrys6hDsrXT/s1600/Screen+Shot+2018-09-12+at+12.41.36.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="896" data-original-width="1600" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6v3Fbcwp-ThSqPpG5d560k0qBQXUrsvTD7ymH3Se3riKIXMXRu8_6Jaa8zgkomkaYhzkPA15h6OlrlvLFYNvZhNehWOyfmPJBxZJmRqJCdvwGNkfXIYs-lAxlf8MC4nJgOfrys6hDsrXT/s400/Screen+Shot+2018-09-12+at+12.41.36.png" width="400" /></a></div>
<br />
The first area for improvement is detecting land where the accuracy rate is lower than it is for water.<br />
<br />
The second area for further work is to the resolve the "delay" visible in the calculated heights. This is not a major issue in this application as the height and shape are more important than the horizontal displacement / phase.<br />
<br />
The following shows more challenging wave forms.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9pHpQdgjfIZDtkwx9Zyq7PbgfkzWAlXMBwIGA5y4jbDKIvIjCD76ocNHBTwvQIg5oCPsqNW6Tl1VvWczZlCIcRxAetWhEUdymfCsvR4vWiuBY9a5vVXA1X3czgz0KABm1uC66Vem2Mfy1/s1600/Screen+Shot+2018-09-12+at+12.48.57.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="897" data-original-width="1600" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9pHpQdgjfIZDtkwx9Zyq7PbgfkzWAlXMBwIGA5y4jbDKIvIjCD76ocNHBTwvQIg5oCPsqNW6Tl1VvWczZlCIcRxAetWhEUdymfCsvR4vWiuBY9a5vVXA1X3czgz0KABm1uC66Vem2Mfy1/s400/Screen+Shot+2018-09-12+at+12.48.57.png" width="400" /></a></div>
<br />
A good next challenge is to automate the detection of the correct peak, and neural network architectures that take into account a sequence of data - such as <b><a href="https://en.wikipedia.org/wiki/Recurrent_neural_network">recurrent neural networks</a></b> - can help in these scenarios.<br />
<br />
<br />MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-9796315153776402202018-05-22T08:10:00.001-07:002018-05-22T08:15:49.363-07:00Imageio.imread() Replaces Scipy.misc.imread()Some of the code we wrote reads data from image files using a helper function <b><a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.imread.html">scipy.misc.imread()</a>.</b><br />
<br />
However, recently, users were notified that this function is deprecated:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiif538KvSddLgDddKTUWNzi-Hi19jpG7nj1pbwaPKY8yNZxskgZtRMkzEYurIOrcZOGBhcHyP5HddGcREebIKUQPCtqJl5cSuwpKIxS0QvO2KldkkB3YVXUnuvso3gz9fdfGaF_1ILIwzP/s1600/Screen+Shot+2018-05-22+at+15.00.06.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="134" data-original-width="1600" height="51" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiif538KvSddLgDddKTUWNzi-Hi19jpG7nj1pbwaPKY8yNZxskgZtRMkzEYurIOrcZOGBhcHyP5HddGcREebIKUQPCtqJl5cSuwpKIxS0QvO2KldkkB3YVXUnuvso3gz9fdfGaF_1ILIwzP/s640/Screen+Shot+2018-05-22+at+15.00.06.png" width="640" /></a></div>
<br />
We're encouraged to use the <a href="http://imageio.readthedocs.io/en/latest/userapi.html#imageio.imread"><b>imageio.imread()</b></a> function instead.<br />
<br />
<h3>
From imread() to imread()</h3>
The change is very easy. We first change the import statements which include the helper library.<br />
<br />
From this:<br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b>import scipy.misc</b></span></span><br />
<br />
To this:<br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b>import imageio</b></span></span><br />
<div>
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b><br /></b></span></span></div>
We then change the actual function which reads image data from files.<br />
<br />
From this form:<br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b>img_array = scipy.misc.imread(image_file_name, flatten=True)</b></span></span><br />
<br />
To this form:<br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b>img_array = imageio.imread(image_file_name, as_gray=True)</b></span></span><br />
<br />
Easy!<br />
<br />
We can see the new function is used in a very similar way. We still provide the name of the image file we want to read into a array of data.<br />
<br />
Previously we used <b>flattern=True</b> to convert the image pixels into a greyscale value, instead of having separate numbers for the red, green, blue and maybe alpha channels. We now use <b>as_grey=True</b> which does the same thing.<br />
<br />
I thought we might have to mess about with inverting number ranges from 0-255 to 255-0 but it seems we don't need to.<br />
<br />
<br />
<h3>
Github Code Updated</h3>
The notebooks which use <b>imread()</b> have been updated on the main <a href="https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork">github</a> repository.<br />
<br />
This does mean the code is slightly different to that described in the book, but the change should be easy to understand until a new version of the book is released.<br />
<br />MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-19636631168639229172018-05-16T07:43:00.003-07:002018-05-16T07:48:06.873-07:00Online Interactive Course by Educative.ioI've been really impressed with <a href="https://www.educative.io/">educative.io</a> who took the content for <a href="https://www.amazon.com/Make-Your-Own-Neural-Network-ebook/dp/B01EER4Z4G/">Make Your Own Neural Network</a> and developed a beautifully designed interactive online course.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://www.educative.io/collection/5693482056286208/5649050225344512" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" data-original-height="1176" data-original-width="1600" height="293" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimTEL5Y03d20kCt4kzVfASbIH9Re7UgisLX74pn6su0RUQ-kBv0Mi-MV199odt2UCFf1IlOzzhk3zBHTfXnz9nUQfqNonzjRc423TexIt4BMtRVN2VVD5IKS5nG3NklLfaMcRMqOUFhrWi/s400/Screen+Shot+2018-05-16+at+15.39.15.png" width="400" /></a></div>
<br />
The course breaks the content down into digestible bite-size chunks, and the interactivity is really helpful to the process of learning through hands-on experimentation and play.<br />
<br />
Have a go!<br />
<br />
<br />MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-69304034049705591812018-02-07T13:31:00.000-08:002018-02-07T13:31:13.001-08:00Saving and Loading Neural NetworksA very common question I get is how to <b>save</b> a neural network, and <b>load</b> it again later.<br />
<br />
<br />
<h3>
Why Save and Load?</h3>
There are two key scenarios when being able to save and load a neural network are useful.<br />
<br />
<ul>
<li>During a long training period it is sometimes useful to <b>stop and continue</b> at a later time. This might be because you're using a laptop which can't remain on all the time. It could be because you want to stop the training and test how well the neural network performs. Being able to resume training at a different time is really helpful.</li>
</ul>
<ul>
<li>It is useful to <b>share</b> your trained neural network with others. Being able to save it, and for someone else to load it, is necessary for this to work.</li>
</ul>
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgd844uMmh0dpVWTYiQ4S1SdLPbWnrYbSYHegZGWm5TtTUQqNLISGM2FNqrUqEQaV-2Q57D1Xltj0rpFdzQmV0OGe7SzZAz_HWTT4QnmONiOKrdwS62JUfNe8FP9FqGRJWE1PhqOpgn5Et_/s1600/save_load.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="633" data-original-width="1600" height="251" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgd844uMmh0dpVWTYiQ4S1SdLPbWnrYbSYHegZGWm5TtTUQqNLISGM2FNqrUqEQaV-2Q57D1Xltj0rpFdzQmV0OGe7SzZAz_HWTT4QnmONiOKrdwS62JUfNe8FP9FqGRJWE1PhqOpgn5Et_/s640/save_load.png" width="640" /></a></div>
<br />
<br />
<h3>
What Do We Save?</h3>
In a neural network the thing that is doing the learning are the link weights. In our <a href="https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork/blob/master/part2_neural_network_mnist_data.ipynb">Python code</a>, these are represented by matrices like <b><span style="font-family: Courier New, Courier, monospace;">wih</span></b> and <b><span style="font-family: Courier New, Courier, monospace;">who</span></b>. The <b><span style="font-family: Courier New, Courier, monospace;">wih</span></b> matrix contains the weights for the links between the input and hidden layer, and the <b><span style="font-family: Courier New, Courier, monospace;">who</span></b> matrix contains the weights for the links between the hidden and output layer.<br />
<br />
If we save these matrices to a file, we can load them again later. That way we don't need to restart the training from the beginning.<br />
<br />
<br />
<h3>
Saving Numpy Arrays</h3>
The matrices <b><span style="font-family: Courier New, Courier, monospace;">wih</span></b> and <b><span style="font-family: Courier New, Courier, monospace;">who</span></b> are <b><a href="http://www.numpy.org/">numpy</a></b> arrays. Luckily the <b>numpy</b> library provides convenience functions for saving and load them.<br />
<br />
The function to save a numpy array is <b><span style="font-family: Courier New, Courier, monospace;">numpy.save(filename, array)</span></b>. This will store <span style="font-family: Courier New, Courier, monospace;"><b>array</b></span> in <span style="font-family: Courier New, Courier, monospace;"><b>filename</b></span>. If we wanted to add a method to our <b><span style="font-family: Courier New, Courier, monospace;">neuralNetwork</span></b> class, we could do it simply it like this:<br />
<br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: Courier New, Courier, monospace;"># save neural network weights </span></span><br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: Courier New, Courier, monospace;">def save(self):</span></span><br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: Courier New, Courier, monospace;"> numpy.save('saved_wih.npy', self.wih)</span></span><br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: Courier New, Courier, monospace;"> numpy.save('saved_who.npy', self.who)</span></span><br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: Courier New, Courier, monospace;"> pass</span></span><br />
<br />
This will save the <b><span style="font-family: Courier New, Courier, monospace;">wih</span></b> matrix as a file <b><span style="font-family: Courier New, Courier, monospace;">saved_wih.npy</span></b>, and the <b><span style="font-family: Courier New, Courier, monospace;">wih</span></b> matrix as a file <b><span style="font-family: Courier New, Courier, monospace;">saved_wih.npy</span></b>.<br />
<br />
If we want to stop the training we can issue <b><span style="font-family: Courier New, Courier, monospace;">n.save()</span></b> in a notebook cell. We can then close down the notebook or even shut down the computer if we need to.<br />
<br />
<br />
<h3>
Loading Numpy Arrays</h3>
To load a numpy array we use <b><span style="font-family: Courier New, Courier, monospace;">array = numpy.load(filename)</span></b>. If we want to add a method to our neuralNetwork class, we should use the filenames we used to save the data.<br />
<br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: Courier New, Courier, monospace;"># load neural network weights </span></span><br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: Courier New, Courier, monospace;">def load(self):</span></span><br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: Courier New, Courier, monospace;"> self.wih = numpy.load('saved_wih.npy')</span></span><br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: Courier New, Courier, monospace;"> self.who = numpy.load('saved_who.npy')</span></span><br />
<span style="background-color: #fce5cd;"><span style="color: blue; font-family: Courier New, Courier, monospace;"> pass</span></span><br />
<br />
If we come back to our training, we need to run the notebook up to the point just before training. That means running the Python code that sets up the neural network class, and sets the various parameters like the number of input nodes, the data source filenames, etc.<br />
<br />
We can then issue <b><span style="font-family: Courier New, Courier, monospace;">n.load()</span></b> in a notebook cell to load the previously saved neural networks weights back into the neural network object <b><span style="font-family: Courier New, Courier, monospace;">n</span></b>.<br />
<br />
<br />
<h3>
Gotchas</h3>
We've kept the approach simple here, in line with our approach to learning about and coding simple neural networks. That means there are some things our very simple network saving and loading code doesn't do.<br />
<br />
Our simple code only saves and loads the two <b><span style="font-family: Courier New, Courier, monospace;">wih</span></b> and <b><span style="font-family: Courier New, Courier, monospace;">who</span></b> weights matrices. It doesn't do anything else. It doesn't check that the loaded data matches the desired size of neural network. We need to make sure that if we load a saved neural network, we continue to use it with the same parameters. For example, we can't train a network, pause, and continue with different settings for the number of nodes in each layer.<br />
<br />
If we want to share our neural network, they need to also be running the same Python code. The data we're passing them isn't rich enough to be independent of any particular neural network code. Efforts to develop such an open inter-operable data standard have started, for example the <a href="http://onnx.ai/">Open Neural Network Exchange Format</a>.<br />
<br />
<br />
<h3>
HDF5 for Very Large Data</h3>
In some cases, with very large networks, the amount of data to be saved and loaded can be quite big. In my own experience from around 2016, the normal saving of bumpy arrays in this was didn't always work. I then fell back to a slightly more involved method to save and load data using the very mature <b>HDF5</b> data format , popular in science and engineering.<br />
<br />
The Anaconda Python distribution allows you to install the <a href="http://docs.h5py.org/en/latest/"><b>h5py</b></a> package, which gives Python the ability to work with <a href="https://en.wikipedia.org/wiki/Hierarchical_Data_Format">HDF5</a> data.<br />
<br />
HDF5 data stores do more than the simple data saving and loading. They have the idea of a group or folder which can contain several data sets, such as numpy arrays. The data stores also keep account of data set names, and don't just blindly save data. For very large data sets, the data can be traverse and segmented on-disk without having to load it all into memory before subsets are taken.<br />
<br />
You can explore more here: <a href="http://docs.h5py.org/en/latest/quick.html#quick">http://docs.h5py.org/en/latest/quick.html#quick</a><br />
<br />MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-79443916778759856332017-05-23T09:50:00.000-07:002017-05-25T02:49:38.298-07:00Learning MNIST with GPU Acceleration - A Step by Step PyTorch Tutorial<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script>
I'm often asked why I don't talk about neural network frameworks like <a href="https://www.tensorflow.org/">Tensorflow</a>, <a href="http://caffe.berkeleyvision.org/">Caffe</a>, or <a href="http://deeplearning.net/software/theano/">Theano</a>.<br />
<br />
<br />
<h3>
Reasons for Not Using Frameworks</h3>
I avoided these frameworks because the main thing I wanted to do was to learn how neural networks actually work. That includes learning about the core concepts and the maths too. By creating our own neural networks code, from scratch, we can really start to understand them, and the issues that emerge when trying to apply them to real problems.<br />
<br />
We don't get that learning and experience if we only learned how to use someone else's library.<br />
<br />
<br />
<h3>
Reasons for Using Frameworks - GPU Acceleration</h3>
But there are some good reasons for using such frameworks, after you've learned about how neural networks actually work.<br />
<br />
One reason is that you want to take advantage of the special hardware in some computers, called a <a href="https://en.wikipedia.org/wiki/Graphics_processing_unit">GPU</a>, to accelerate the core calculations done by a neural network. The <b>GPU</b> - graphics processing unit - was traditionally used to accelerate calculations to support rich and intricate graphics, but recently that same special hardware has been used to accelerate machine learning.<br />
<br />
The normal brain of a computer, the <b>CPU</b>, is good at doing all kinds of tasks. But if your tasks are matrix multiplications, and lots of them in parallel, for example, then a GPU can do that kind of work much faster. That's because they have lots and lots of computing cores, and very fast access to locally stored data. Nvidia has a page explaining the advantage, with a fun video too - <a href="http://www.nvidia.com/object/what-is-gpu-computing.html">link</a>. But remember, GPU's are not good for general purpose work, they're just really fast at a few specific kinds of jobs.<br />
<br />
The following illustrates a key difference between general purpose CPUs and GPUs with many, more task-specific, compute cores:<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzsi8wscdS75tBhDvYVkbqgSS0yxo225wt4K6EvCykAmEQ_WILJW_A5NSZqA1mKhQTJQThur25t836FFTv3rbaWDY1wWRzEmPDTDCcdOrdzVhy4ac92UcaoaVD0GvMXIkTPOlhR6qF2m8P/s1600/GPU_vs_CPU.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="217" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzsi8wscdS75tBhDvYVkbqgSS0yxo225wt4K6EvCykAmEQ_WILJW_A5NSZqA1mKhQTJQThur25t836FFTv3rbaWDY1wWRzEmPDTDCcdOrdzVhy4ac92UcaoaVD0GvMXIkTPOlhR6qF2m8P/s400/GPU_vs_CPU.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">GPU's have hundreds of cores, compared to a CPU's 2, 4 or maybe 8.</td></tr>
</tbody></table>
<br />
Writing code to directly take advantage of GPU's is not fun, currently. In fact, it is extremely complex and painful. And very very unlike the joy of easy coding with Python.<br />
<br />
This is where the neural network frameworks can help - they allows you to imagine a much simpler world - and write code in that word, which is then translated into the complex, detailed, and low-level nuts-n-bolts code that the GPUs need.<br />
<br />
There are quite a few neural network frameworks out there .. but comparing them can be confusing. There are a few good comparisons and discussions on the web like this one - <a href="https://deeplearning4j.org/compare-dl4j-torch7-pylearn">link</a>.<br />
<br />
<br />
<h3>
PyTorch</h3>
I'm going to use <b><a href="http://pytorch.org/about/">PyTorch</a></b> for three main reasons:<br />
<ul>
<li>It's largely vendor independent. Tensorflow has a lot of momentum and interest, but is very much a Google product. </li>
<li>It's designed to be Python - not an ugly and ill-fitting Python wrap around something that really isn't Python. Debugging is also massively easier if what you're debugging is Python itself.</li>
<li>It's simple and light - preferring simplicity in design, working naturally with things like the ubiquitous numpy arrays, and avoiding hiding too much stuff as magic, something I really don't like.</li>
</ul>
<br />
Some more discussion of PyTorch can be found here - <a href="https://www.reddit.com/r/MachineLearning/comments/5w3q74/d_so_pytorch_vs_tensorflow_whats_the_verdict_on/">link</a>.<br />
<br />
<br />
<h3>
Working With PyTorch</h3>
To use PyTorch, we have to understand how it wants to be worked with. This will be a little different to the normal Python and numpy world we're used to.<br />
<br />
The main ideas are:<br />
<ul>
<li>build up your network architecture using the building blocks provided by PyTorch - these are things like layers of nodes and activation functions.</li>
<li>you let PyTorch automatically work out how to back propagate the error - it can do this for any of the building blocks it provides, which is really convenient.</li>
<li>we train the network in the normal way, and measure accuracy as usual, but pytorch provides functions for doing this.</li>
<li>to make use of the GPU, we configure a setting to and push the neural network weight matrices to the GPU, and work on them there.</li>
</ul>
We shouldn't try to replicate what we did with our pure Python (and bumpy) neural network code - we should work with PyTorch in the way it was designed to be used. <br />
<br />
A key part of this auto differentiation. Let's look at that next.<br />
<br />
<br />
<h3>
Auto Differentiation</h3>
A powerful and central part of PyTorch is the ability to create neural networks, chaining together different elements - like activation functions, convolutions, and error functions - and for PyTorch to work out the error gradients for the various parameters we want to improve.<br />
<br />
That's quite cool if it works!<br />
<br />
Let's see it working. Imagine a simple parameter $y$ which depends on another input variable $x$. Imagine that<br />
<br />
$$ y = x^2 + 5x + 2 $$<br />
<br />
Let's encode this in PyTorch:<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">import torch</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">from torch.autograd import Variable</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">x = Variable(torch.Tensor([2.0]), requires_grad=True)</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">y = (x**2) + (5*x) + 2</span><br />
<br />
Let's look at that more slowly. First we import torch, and also the <b>Variable</b> from torch.autograd, the auto differentiation library. Variable is important because we need to wrap normal Python variables with it, so that PyTorch can do the differentiation. It can't do it with normal Python variables like a = 10, or b = 5*a. <b>Variables</b> include links to where the variables came from - so that if one depends on another, PyTorch can do the correct differentiation.<br />
<br />
We then create <b>x</b> as a <b>Variable</b>. You can see that it is a simple tensor of trivial size, just a single number, 2.0. We also signal that it requires a gradient to be calculated.<br />
<br />
A <b>tensor</b>? Think of it as just a fancy name for multi-dimensional matrices. A 2-dimensional tensor is a matrix that we're all familiar with, like bumpy arrays. A 1-dimensional tensor is like a list. A 0-dimensional one is just a single number. When we create a torch.Tensor([2.0]) w'ere just creating a single number.<br />
<br />
We then create the next <b>Variable</b> called <b>y</b>. That looks like a normal Python variable by the way we've created it .. but it isn't, because it is made from <b>x</b>, which is a PyTorch <b>Variable</b>. Remember, the magic that <b>Variable</b> brings is that when we define <b>y</b> in terms of <b>x</b>, the definition of <b>y</b> remembers this, so we can do proper differentiation on it with respect to <b>x</b>.<br />
<br />
So let's do the differentiation!<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">y.backward()</span><br />
<br />
That's it. That all that is required to ask PyTorch to use what it knows about <b>y</b> and all the <b>Variable</b>s it depends on to work out how to differentiate it.<br />
<br />
Let's see if it did it correctly. Remember that $x=2$ so we're asking for<br />
<br />
$$ \frac{\delta y}{\delta x}\Big|_{x=2} = 2x + 5 = 9$$<br />
<br />
This is how we ask for that to be done.<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">x.grad</span><br />
<br />
Let's see how all that works out:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilqnBZMrJ7Xtdk6gvlNt9jSdmgXVET-R5ub-q2ko8kYbPy5SefxsQcdphavHrVJh6n6fIr08e5TcDDC2u6UcqPPAyruvf-yvn9UBDISAOq3_shBXC95pnYmzrM0lC5J2jsQgH4sszogXSR/s1600/Screen+Shot+2017-05-21+at+21.47.57.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="308" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilqnBZMrJ7Xtdk6gvlNt9jSdmgXVET-R5ub-q2ko8kYbPy5SefxsQcdphavHrVJh6n6fIr08e5TcDDC2u6UcqPPAyruvf-yvn9UBDISAOq3_shBXC95pnYmzrM0lC5J2jsQgH4sszogXSR/s400/Screen+Shot+2017-05-21+at+21.47.57.png" width="400" /></a></div>
<br />
It works! You can also see how <b>y</b> is shown as type <b>Variable</b>, not just <b>x</b>.<br />
<br />
So that's cool. And that's how we define our neural network, using elements that PyTorch provides us, so it can automatically work out error gradients.<br />
<br />
<h3>
Let's Describe Our Simple Neural Network</h3>
Let's look at some super-simple skeleton code which is a common starting point for many, if not all, PyTorch neural networks.<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">import torch</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">import torch.nn</span><br />
<span style="color: blue;"><span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;">class NeuralNetwork(torch.nn.Module):</span></span><br />
<span style="color: blue;"><span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> def <b>__init__(self)</b>:</span></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ....</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> pass</span><br />
<span style="color: blue;"><span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> def <b>forward(self, inputs)</b>:</span></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> ....</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> return outputs</span><br />
<span style="color: blue;"><span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;">net = NeuralNetwork()</span></span><br />
<br />
<b>Inheritance</b><br />
The neural network class is derived from <b>torch.nn.Module</b> which brings with it the machinery of a neural network including the training and querying functions - see <a href="http://pytorch.org/docs/nn.html#torch.nn.Module">here</a> for the documentation.<br />
<br />
There is a tiny bit of boilerplate code we have to add to our initialisation function <b>__init__()</b> .. and that's calling the initialisation of the class it was derived from. That should be the __init__() belonging to torch.nn.Module. The clean way to do this is to use <b><a href="http://www.pythonforbeginners.com/super/working-python-super-function">super()</a></b>:<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> def __init__(self):</span><br />
<span style="color: blue;"><span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> </span><span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"># call the base class's initialisation too</span></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> <b>super().__init__()</b></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> pass</span><br />
<div>
<br />
<br />
We're not finished yet. When we create an object from the NeuralNetwork class, we need to tell it at that time what shape it will be. We're sticking with a simple 3-layer design .. so we need to specify how many nodes there are at the input, hidden and output layers. Just like our pure Python example, we pass this information to the <b>__init__() </b>function. We might as well create these layers during the initialisation. Our <b>__init__()</b> now looks like this:<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> def <b>__init__(self, inodes, hnodes, onodes)</b>:</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # call the base class's initialisation too</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> super().__init__()</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> </span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # define the layers and their sizes, turn off bias</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> <b>self.linear_ih = nn.Linear(inodes, hnodes, bias=False)</b></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b> self.linear_ho = nn.Linear(hnodes, onodes, bias=False)</b></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b> </b></span><br />
<span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><span style="background-color: #fff2cc;"> # define activation function</span></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b></b></span><br />
<span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b style="background-color: #fff2cc;"> self.activation = nn.Sigmoid()</b></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> pass</span><br />
<br />
The <b>nn.Linear()</b> module is the thing that creates the relationship between one layer and another and combines the network signals in a linear way .. which is what we did in our pure Python code. Because this is PyTorch, that <b>nn.Linear()</b> creates a parameter that can be adjusted .. the link weights that we're familiar with. You can read more <b>nn.Linear()</b> about it <a href="http://pytorch.org/docs/nn.html#linear-layers">here</a>.<br />
<br />
We also create the activation function we want to use, in this case the logistic sigmoid function. Note, we're using the one provided by<b> torch.nn</b>, not making our own.<br />
<br />
Note that we're not using these PyTorch elements yet, we're just defining them because we have the information about the number of input, hidden and output nodes.<br />
<br />
<br /></div>
<div>
<b>Forward</b></div>
We have to over-ride the <b>forward()</b> function in our neural network class. Remember, that <b>backward()</b> is provided automatically, but can only work if PyTorch knows how we've designed our neural network - how many layers, what those layers are doing with activation functions, what the error function is, etc.<br />
<br />
So let's create a simple <b>forward()</b> function <b>which is the description of the network architecture</b>. Our example will be really simple, just like the one we created with pure Python to learn the MNIST dataset.<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> def <b>forward</b>(self, <b>inputs_list</b>):</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # convert list to Variable</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> <b>inputs = Variable(inputs_list)</b></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> </span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # combine input layer signals into hidden layer</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> <b>hidden_inputs = self.linear_ih(inputs)</b></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # apply sigmiod activation function</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> <b>hidden_outputs = </b><b>self.activation</b><b>(hidden_inputs)</b></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> </span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # combine hidden layer signals into output layer</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> <b>final_inputs = self.linear_ho(hidden_outputs)</b></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # apply sigmiod activation function</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> <b>final_outputs = </b><b>self.activation</b><b>(final_inputs)</b></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> </span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> return final_outputs</span><br />
<br />
You can see the first thing we do is convert the list of numbers, a Python list, into a PyTorch <b>Variable</b>. We must do this, otherwise PyTorch won't be able to calculate the error gradient later.<br />
<br />
The next section is very familiar, the combination of signals at each node, in each layer, followed immediately by the activation function. Here we're using the <b>nn.Linear()</b> elements we defined above, and the activation function we defined earlier, using the <b>torch.nn.Sigmoid()</b> provided by PyTorch.<br />
<br />
<br />
<b>Error Function</b><br />
Now that we've defined the network, we need to define the error function. This is an important bit of information because it defines how we judge the correctness of the neural network, and wrong-ness is used to update the internal parameters during training.<br />
<br />
There are any error functions that people use, some better for some kinds of problems than others. We'll use the really simple one we developed for the pure Python network, the squared error function. It looks like the following.<br />
<br />
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">error_function = <b>torch.nn.MSELoss</b>(size_average=False)</span></span><br />
<br />
We've set the size_average parameter to False to avoid the error function dividing by the size of the target and desired vectors.<br />
<br />
<br />
<h3>
Optimiser</h3>
We're almost there. We've just defined the error function, which means we know how far wrong the neural network is during training. We know that PyTorch can calculate the error gradients for each parameter.<br />
<br />
When we created our simple neural network, we didn't think too much about different ways of improving the parameters based on the error function and error gradients. We simply descended down the gradients a small bit. And that is simple, and powerful.<br />
<br />
Actually there are many refined and sophisticated approaches to doing this step. Some are designed to avoid false minimum traps, others designed to converge as quickly as possible, etc. We'll stick to the simple approach we took, and the closest in the PyTorch toolset is the stochastic gradient descent:<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">optimiser = <b>torch.optim.SGD</b>(net.parameters(), lr=0.1)</span><br />
<br />
We feed this optimiser the adjustable parameters of our neural network, and we also specify the familiar learning rate as <b>lr</b>.<br />
<br />
<br />
<h3>
Finally, Doing the Update</h3>
Finally, we can talk about doing the update - that is, updating the neural network parameters in response to the error seen with each training example.<br />
<br />
Here's how we do that <b>for each training example</b>:<br />
<br />
<ul>
<li>calculate the <b>output</b> for a training data example</li>
<li>use the <b>error function</b> to calculate the difference (the <b>loss</b>, as people call it)</li>
<li><b>zero gradients</b> of the optimiser which might be hanging around from a previous iteration</li>
<li>perform <b>automatic differentiatio</b>n to calculate new gradients</li>
<li>use the optimiser to <b>update parameters</b> based on these new gradients</li>
</ul>
<br />
In code this will look like:<br />
<br />
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">for inputs, target in training_set:</span></span><br />
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></span>
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> output = net(inputs)</span></span><br />
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></span>
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # Compute and print loss</span></span><br />
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> loss = error_function(output, target)</span></span><br />
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> print(loss.data[0])</span></span><br />
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></span>
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> # Zero gradients, perform a backward pass, and update the weights.</span></span><br />
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> optimiser.zero_grad()</span></span><br />
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> loss.backward()</span></span><br />
<span style="background-color: #fff2cc;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> optimiser.step()</span></span><br />
<br />
It is a common error not to zero the gradients during each iteration, so keep an eye out for that. I'm not really sure why the default is not to clear them ...<br />
<br />
<br />
<h3>
The Final Code </h3>
Now that we have all the elements developed and understood, we can rewrite the pure python neural network we developed in the course of <a href="https://www.amazon.com/Make-Your-Own-Neural-Network-ebook/dp/B01EER4Z4G">Make Your Own Neural Network</a> and throughout this blog.<br />
<br />
You can find the code as a notebook on GitHub:<br />
<br />
<ul>
<li><a href="https://github.com/makeyourownneuralnetwork/pytorch/blob/master/pytorch_neural_network_mnist_data.ipynb">https://github.com/ ... /pytorch_neural_network_mnist_data.ipynb</a></li>
</ul>
<br />
<br />
The only unusual thing I had to work out was that during the evaluation of performance, we keep a scorecard list, and append a 1 to it if the network's answer matches the known correct answer from the test data set. This comparison needs the actual number to be extracted from the PyTorch tensor via numpy, as follows. We couldn't just say <span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">label == correct_label</span>.<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">if (<b>label.data[0][0]</b> == correct_label):</span><br />
<br />
The results seem to match our pure python code for performance - no major difference, and we expected that because we've tried to architect the network to be the same.<br />
<br />
<br />
<h3>
Performance Comparison On a Laptop</h3>
Let's compare performance between our simple pure python (with bumpy) code and the PyTorch version. As a reminder, here are the details of the architecture and data:<br />
<br />
<ul>
<li>MNIST training data with 60,000 examples of 28x28 images</li>
<li>neural network with 3 layers: 784 nodes in input layer, 200 in hidden layer, 10 in output layer</li>
<li>learning rate of 0.1</li>
<li>stochastic gradient descent with mean squared error</li>
<li>5 training epochs (that is, repeat training data 5 times)</li>
<li>no batching of training data</li>
</ul>
<br />
The timing was done with the following python notebook magic command in the cell that contains only the code to train the network. The options ensure only one run of the code, and the -c option ensures unix user time is used to account for other tasks taking CPU time on the same machine.<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b>%%timeit -n1 -r1 -c</b></span><br />
<br />
The results from doing this twice eon a MacBook Pro 13 (early 2015), which has no GPU for accelerating the tensor calculations, are:<br />
<br />
<ul>
<li><b><span style="color: blue;">home-made simple pure python - <span style="font-size: large;">440 seconds, 458 seconds</span></span></b></li>
<li><span style="color: red;"><b>simple PyTorch version - <span style="font-size: large;">841 seconds, 834 seconds</span></b></span></li>
</ul>
<br />
<b>Amazing!</b> Our own home-made code is about 1.9 times faster .. <b>roughy twice as fast!</b><br />
<br />
<br />
<h3>
GPU Accelerated Performance</h3>
One of the key reasons we chose to invest time learning a framework like PyTorch is that it makes it easy to take advantage of GPU acceleration. So let's try it.<br />
<br />
I don't have a laptop with a CUDA GPU so I fired up a <a href="https://cloud.google.com/gpu/">Google Cloud Compute Instance</a>. The specs for mine are:<br />
<br />
<ul>
<li>n1-highmem-2 (2 vCPUs, 13 GB memory)</li>
<li>Intel Sandy Bridge</li>
<li>1 x NVIDIA Tesla K80 GPU</li>
</ul>
<br />
So we can compare GPU results with CPU results, I ran the above code but this time not as a notebook but a command line script, using the unix <span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b>time</b></span> command. This will I've us the time to complete the whole program, including the training and testing stages. The results are:<br />
<br />
<span style="background-color: #cfe2f3;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">real 8m14.387s</span></span><br />
<span style="background-color: #cfe2f3;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">user 7m31.223s</span></span><br />
<span style="background-color: #cfe2f3;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">sys 8m39.810s</span></span><br />
<div>
<br /></div>
<div>
The interpretation of these numbers needs some sophistication, especially if our code has multiple threads, so we'll just stick to the simple real wall-clock time of 8m14s or <b>494 seconds</b>.</div>
<div>
<br /></div>
<div>
Now we need to change the code to run on he GPU. First check that <a href="https://en.wikipedia.org/wiki/CUDA">CUDA</a> - NVIDIA's GPU acceleration framework - is available to Python and PyTorch:</div>
<div>
<br /></div>
<div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b>python</b></span></div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00) </span></div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux</span></div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">Type "help", "copyright", "credits" or "license" for more information.</span></div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">>>> i<b>mport torch</b></span></div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">>>> <b>torch.cuda.is_available()</b></span></div>
<div>
<span style="background-color: #fff2cc; color: red; font-family: "courier new" , "courier" , monospace; font-size: x-small;">True</span></div>
</div>
<div>
<br /></div>
<div>
So CUDA is available. This gave a False on my own home laptop.</div>
<div>
<br /></div>
<div>
The overall approach to shifting work from the CPU to the GPU is to shift the tensors there. <a href="http://pytorch.org/docs/notes/cuda.html#cuda-semantics">Here</a> is the current (but immature) PyTorch guidance on working with the GPU. To create a Tensor on a GPU we use <b>torch.cuda</b>:</div>
<div>
<br /></div>
<div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">>>> <b>x = torch.cuda.FloatTensor([1.0, 2.0])</b></span></div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">>>> <b>x</b></span></div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> 1</span></div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> 2</span></div>
<div>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">[torch.cuda.FloatTensor of size 2 (GPU 0)]</span></div>
</div>
<br />
You can see that this new tensor <b>x</b> is created on the GPU, it is shown as GPU 0, as there can be more. If we perform a calculation on x, it is actually varied out on the same GPU 0, and if the results are assigned to a new variable, they are also stored on the same GPU.<br />
<br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">>>> <b>y = x**x</b></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">>>> <b>y</b></span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> 1</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> 4</span><br />
<span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">[torch.cuda.FloatTensor of size 2 (GPU 0)]</span><br />
<br />
This may not seem like much but is incredibly powerful - yet easy to use, as you've just seen.<br />
<br />
The changes to the code are minimal:<br />
<br />
<ul>
<li>we move the neural network class to the GPU once we've created it using <span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b style="background-color: #fff2cc;">n.cuda()</b></span></li>
<li>the inputs are converted from a list to a PyTorch Tensor, we now use the CUDA variant: <span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">inputs = Variable(torch.<b>cuda</b>.FloatTensor(inputs_list).view(1, self.inodes))</span></li>
<li>similarly the target outputs are also coverted using this variant: <span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">target_variable = Variable(torch.<b>cuda</b>.FloatTensor(targets_list).view(1, self.onodes), requires_grad=False)</span></li>
</ul>
<br />
That's it! Not too difficult at all .. actually that took a day to work out because the PyTorch documentation isn't yet that accessible to beginners.<br />
<br />
The results from the GPU enabled version of the code are:<br />
<br />
<span style="background-color: #cfe2f3; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">real 6m6.328s</span><br />
<span style="background-color: #cfe2f3; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">user 5m57.443s</span><br />
<span style="background-color: #cfe2f3; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;">sys 0m13.488s</span><br />
<br />
That is faster at <b>366 seconds</b>. That's about 25% faster. We're seeing some encouraging results.<br />
<br />
Let's do more runs, just to be scientific and collate the results:<br />
<br />
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b><span style="color: blue;">CPU</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: red;">GPU. </span></b></span><br />
<span class="Apple-tab-span" style="background-color: #fff2cc; white-space: pre;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> </span></span><br />
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><span style="color: blue;">494</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: red;">366. </span></span><br />
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><span style="color: blue;">483</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: red;">372. </span></span><br />
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><span style="color: blue;">451</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: red;">355. </span></span><br />
<span class="Apple-tab-span" style="background-color: #fff2cc; white-space: pre;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> </span></span><br />
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b><span style="color: blue;">476.0</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: red;">364.3</span></b></span><br />
<br />
The GPU based network is consistently faster by about 25%.<br />
<br />
Perhaps we expected the code to be much much faster? Well for such a small network, the overheads corrode the benefits. The GPU approach really shines for much larger networks and data.<br />
<br />
Let's do a better experiment and compare the PyTorch code in CPU and GPU mode, varying the number of hidden layer nodes. Here are the results:<br />
<br />
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b><span style="color: blue;">nodes</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: blue;">CPU</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: red;">GPU</span></b></span><br />
<span class="Apple-tab-span" style="background-color: #fff2cc; white-space: pre;"><span style="color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"> </span></span><br />
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><span style="color: blue;">200</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: blue;">463</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: red;">362</span></span><br />
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><span style="color: blue;">1000</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: blue;">803</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: red;">356</span></span><br />
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><span style="color: blue;">2000</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: blue;">1174</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: red;">366</span></span><br />
<span style="background-color: #fff2cc; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><span style="color: blue;">5000</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: blue;">3390</span><span class="Apple-tab-span" style="color: blue; white-space: pre;"> </span><span style="color: red;">518</span></span><br />
<br />
<br />
Visualising this ...<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRnTu10CPYUQGP9hsa8dbFmCn_O0KgUb8NhDKYJ-h51a5a1S13p1CbRW0d_MH5aSRhUiYB-jnsFSdGG-qqB0LyWwH4K5iOxv_yMv_CPwCbMB8Iskye37GcqH2lb7I1z_EIiRUroZOZDEJI/s1600/pytorch_cpu_v_gpu_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="937" data-original-width="1600" height="374" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRnTu10CPYUQGP9hsa8dbFmCn_O0KgUb8NhDKYJ-h51a5a1S13p1CbRW0d_MH5aSRhUiYB-jnsFSdGG-qqB0LyWwH4K5iOxv_yMv_CPwCbMB8Iskye37GcqH2lb7I1z_EIiRUroZOZDEJI/s640/pytorch_cpu_v_gpu_2.png" width="640" /></a></div>
<br />
We can see now the benefit of a PyTorch using the GPU. As the scale of the network grows (hidden layer nodes here), the time it takes for the GPU to complete training rises very slowly, compared to the CPU doing it, which rises quickly.<br />
<br />
One one more tweak .. the contributors at GitHub suggested setting an environment variable to control how many CPU threads the task is managed by. See <a href="https://github.com/pytorch/pytorch/issues/1630">here</a>. In my Google GPU instance I'll set this to <span style="background-color: #fff2cc; color: blue; font-family: "courier new" , "courier" , monospace; font-size: x-small;"><b>OMP_NUM_THREADS=2</b></span>. The resulting duration is 361 seconds .. so not much improved. We didn't see an improvement when we tried it on the CPU only code, earlier. I did see that less threads were being used, by using the top utility, but at these scales I didn't see a difference.<br />
<br />
<br />MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-51968730524882672852017-04-07T15:56:00.003-07:002017-04-07T15:58:05.093-07:00Neural Network in ForthI love how people have been inspired to make their own neural networks in their own way, sometimes using R or Julia programming langauages.<br />
<br />
I was very pleasantly surprised that Robin had decided to make neural networks in Forth.<br />
<br />
Forth is an interesting langauge - you can read about it <a href="https://en.wikipedia.org/wiki/Forth_(programming_language)">here</a>, and <a href="https://bernd-paysan.de/why-forth.html">here</a> - it is a small, efficient and fast language, with applicatiosn often close to the metal.<br />
<br />
You can follow Robin's progress here: <span style="font-family: "arial" , "helvetica" , sans-serif; font-size: small;"><span style="font-size: medium;"><a href="https://rforth.wordpress.com/">https://rforth.wordpress.com/</a></span></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://rforth.wordpress.com/2017/03/30/30th-march-2017/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="326" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtfWM6KaHJm9jQsmaMm3SpShFNyABj54BH95ah2VTzONUsqIILdeqLY-HIfGrim_kdoukbE-Ti1iTTWmGsB9IQksTm92aYo-fzGgTplhLdcRt0HnI77HD_2YKsjBCqJa_sFGtisTGqX5mc/s400/5test.jpeg" width="400" /></a></div>
MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-15016246620898069162017-03-06T05:54:00.000-08:002017-03-06T10:30:30.386-08:00Guest Post: Python to RThis is a guest post by Alex Glaser, who runs the <a href="https://www.meetup.com/London-Kaggle-Meetup/">London Kaggle meetup</a> and organises several dojos. <br />
<br />
Alex took on the challenge of making his own neural networks, but instead of using Python, he used R. Here is talks about that journey, things he had to overcome, and some insight into performance differences and tools for profiling too.<br />
<br />
<hr />
<h1>
Python to R Translation</h1>
Having read through <a href="https://www.amazon.co.uk/Make-Your-Own-Neural-Network/dp/1530826608/ref=sr_1_1?ie=UTF8&qid=1488721684&sr=8-1&keywords=Tariq+Rashid">Make your own Neural Network</a> (and indeed made one myself) I decided to experiment with the Python code and write a translation into <a href="https://www.r-project.org/">R</a>. Having been involved in statistical computing for many years I’m always interested in seeing how different languages are used and where they can be best utilised.<br />
There were a few ground rules I set myself before starting the task:<br />
<ul>
<li>All code was to be ‘base’ R (other packages could be added later)</li>
<li>The code would be as close to a ‘line-by-line’ translation (again, more R-centric code could be written later)</li>
<li>The assignment opertor “<-” would be used.</li>
</ul>
As a little aside, a quick word about the assignment operator. It can be confusing for new users, or those coming from other languages, but for the majority of issues it can be used interchangeably with “=”. Having been a long time R user I quite like the assignment operator, a little history about it can be found here <a href="http://blog.revolutionanalytics.com/2008/12/use-equals-or-arrow-for-assignment.html">here</a>. It also provides a bit of continuity with the other <a href="https://stat.ethz.ch/R-manual/R-devel/library/base/html/assignOps.html">assignment operators</a>, notably the global assignment operator “<<-”. It also allows assignment of a variable within a function call, e.g.<br />
<br />
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgL2f33Se3AOvZUZSeUtiNqITwk0P66sjLo-p1hJFt5KsFRW6UPaH-ebRS7345b-cni_4KmjJWpFNDyxbrS_fvkzEr5lVmXxCKA7DTBqIeAJSKV6CRT5Tu7fw3ZgE1mlNCHzvDnJ3UURKSJ/s1600/code1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="167" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgL2f33Se3AOvZUZSeUtiNqITwk0P66sjLo-p1hJFt5KsFRW6UPaH-ebRS7345b-cni_4KmjJWpFNDyxbrS_fvkzEr5lVmXxCKA7DTBqIeAJSKV6CRT5Tu7fw3ZgE1mlNCHzvDnJ3UURKSJ/s640/code1.png" width="640" /></a></div>
<br />
Translating the code from Python to R also allowed me to start using <a href="https://www.rstudio.com/">R Studio’s</a> notbook. Don’t get me wrong, I do like <a href="http://jupyter.org/">Juypter</a>, but there’s always room to look at what else is out there. Each cell starts with a <a href="https://ipython.org/ipython-doc/3/interactive/magics.html">magic-like</a> command saying what language is going to be used in each cell e.g. <span style="background-color: #fce5cd;"><code>```{r}</code></span> for R, <span style="background-color: #fce5cd;"><code>```{python}</code></span> for Python, etc.<br />
<br />
Just sticking with the code in Part2 of Tariq’s book (code available <a href="https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork">here</a>) a simple place to start was just to replicate printing of a single MNIST image (part2_mnist_data_set.ipynb). Reading the data in was fairly simple; both R and Python have the readlines command (readLines in R), R also has some nice graphical capabilities and <span style="background-color: #fce5cd;"><code>matrix</code></span> is a commonly used object. A few ideas cropped up which might be of interest to a new user: splitting a string results in a <span style="background-color: #fce5cd;"><code>list</code></span> (another R data type) and in order to plot the image successfully we need to reverse the ordering of the rows. The latter could be done using indexes but I thought using an <span style="background-color: #fce5cd;"><code>apply</code></span> function would be quite a nice way of doing this. The <span style="background-color: #fce5cd;"><code>apply</code></span> suite of functions are an important part of R code and often provide a succinct way of coding without lots of for loops.<br />
<br />
Okay, one notebook down, another one to go, this time the biggie (part2_neural_network_mnist_data.ipynb). One aspect of Python (and other aspects of object-orientated languages) that differs from R is the notion of a class. A <span style="background-color: #fce5cd;"><code>class</code></span> does exist in <a href="http://adv-r.had.co.nz/OO-essentials.html">R</a>, but often they are used internally to ‘collect’ all output from a function, e.g.<br />
<br />
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDs7IGCzeAv5GcYprlP-_lnLxHvARALAyeuHRn_yyOuQpUhUWM0VtWXW9Cc8VewKL9kL0pj69JfG24DRQAXnwJf05qnjm7TceqlO_PiN4qg4Jd8LyZvKxdj2zL8EuMHh4_fI3sDHPe2U7T/s1600/code2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="152" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDs7IGCzeAv5GcYprlP-_lnLxHvARALAyeuHRn_yyOuQpUhUWM0VtWXW9Cc8VewKL9kL0pj69JfG24DRQAXnwJf05qnjm7TceqlO_PiN4qg4Jd8LyZvKxdj2zL8EuMHh4_fI3sDHPe2U7T/s640/code2.png" width="640" /></a></div>
<br />
Also, this class would be defined at the end of a function rather than at the start, e.g. you may get code like the following at the end of a function<br />
<br />
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhs43hvIr89YJ0B6UUMhOKvVxILASitwjwapd8__z6nD5xHTIedILrPI8dAOOXtmqOBEsgtsPqFrQ9VH0hNGyfEoh18gjSYbCwDo6-sc8qoHMJPH0CeiUkYrieyrHR9z2HMYF9OlREvOQB4/s1600/code3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhs43hvIr89YJ0B6UUMhOKvVxILASitwjwapd8__z6nD5xHTIedILrPI8dAOOXtmqOBEsgtsPqFrQ9VH0hNGyfEoh18gjSYbCwDo6-sc8qoHMJPH0CeiUkYrieyrHR9z2HMYF9OlREvOQB4/s1600/code3.png" /></a></div>
<br />
which would return an object of class ‘quiz’.<br />
<br />
Our initial attempt at ‘translating’ the code was supposed to be as close as a ‘line-to-line’ translation as possibe, so that people could see how one line in Python would be written in R. This also meant that we had to create an artificial class using R’s function; note that it uses the dollar symbol to reference elements of this class, rather than the dot that we see in Python code. Also, we used the word ‘self’ to allow continuation with the Python code though it doesn’t often get used with R code. One final comment, it only replicates some of the functionality of a class, it isn’t a class replacement so some of the behaviour may not be the same.<br />
Matrix multiplication in R is done by using the following command: “%*%”, e.g<br />
<br />
<div class="separator" style="clear: both; text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7Dojq4zpA8iaTQ5xv9urW952uvVSTVlfYHs7fLj4ble94dFtcN0-iCFJ8sk8RrK8ZgjXKtMtxuvmj17qU7h6WrOjt8XJp5O18U1nVTBQE8peQQJGGXt1B0go-hvASb7C_18KVSaWyTOyw/s1600/code4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="84" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7Dojq4zpA8iaTQ5xv9urW952uvVSTVlfYHs7fLj4ble94dFtcN0-iCFJ8sk8RrK8ZgjXKtMtxuvmj17qU7h6WrOjt8XJp5O18U1nVTBQE8peQQJGGXt1B0go-hvASb7C_18KVSaWyTOyw/s640/code4.png" width="640" /></a></div>
<br />
Most of the time the coding was relatively straighforward, and after a few false starts, we managed to replicate the results of the original Python code and get over 97% accuracy. However there was one big difference, the time taken. Now I’ve heard all sorts of arguments about the speed comparison of R and Python, but had assumed that since things like matrix multiplication were undertaken in C++ or Fortran these speed differences would not be considerate, however that was not the case. The Python code on my (admittedly 5+ year old Mac) takes about 6 mins, whilst the R code took roughly double that.<br />
<br />
There are a few nice ‘<b>profiling</b>’ commands in R (and the <a href="https://rstudio.github.io/profvis/"><code>profVis</code></a> package provides some nice interactivity) and when we looked the R code in more depth it was the final matrix multiplication in the ‘train’ function that was taking about 85% of the time (we used the <a href="https://stat.ethz.ch/R-manual/R-devel/library/base/html/crossprod.html"><code>tcrossprod</code></a> command in R to separate this multiplication from the rest). This last matrix multiplication is simply the outer product of two vectors, so it’s difficult to see why it would be too time consuming<br />
<br />
Looking at a few examples it’s not hard to see that Python’s <span style="background-color: #fce5cd;"><code>np.dot</code></span> function is far faster than R’s <span style="background-color: #fce5cd;"><code>%*%</code></span> command. Now for a few matrices this isn’t an issue (what’s a few hundreths of a second against a few thousandths?), however for the MYONN model we’ll be calling each function 300,000 times, so after a while this time differential builds up.<br />
<br />
As mentioned earlier this difference in timings is quite surprising since the underlying code should be C++ or Fortran. It could also be that some underlying library was better optimised in Python than R. This will definitely be explored at a future R or Python coding <a href="http://www.meetup.com/London-Kaggle-Meetup">dojo</a>.<br />
It’s been a fun experience, and as with all work there’s more unexpected questions that come up. A brief synopsis of future work will be:<br />
<ul>
<li>Try and figure out why Python’s matrix multiplication is so much quicker than R’s. Could also try some functions from <a href="http://www.rcpp.org/">Rcpp</a>.</li>
<li>Write the code so that it is a bit more Rcentric, and see if there are any libraries ,such as in the <a href="https://blog.rstudio.org/2016/09/15/tidyverse-1-0-0/">tidyverse</a>, which might be useful (though it would only really be useful if we can solve the previous problem).</li>
<li>Look at using Julia to see how that compares with R and Python</li>
</ul>
<br />
The R code is available from my GitHub page <a href="https://github.com/alexiglaser/MYONN">here</a>, so feel free to download and change as you see fit. Any help with regards optimising the numerical libraries in R to match Python’s speed would be appreciated.MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-71737487850107872772017-02-26T07:22:00.000-08:002017-02-26T07:23:09.874-08:00Book TranslationsI've been really lucky with the interest in my <a href="https://www.amazon.co.uk/Make-Your-Neural-Network-ebook/dp/B01EER4Z4G">Make Your Own Neural Network</a> book.<br />
<br />
Some publishers have been interested in taking the book, but after some thinking I've resisted the temptation because:<br />
<br />
<ul>
<li>I can price the books how I want .. this is important especially for the ebook which I want to be as cheap and accessible as possible. Some publishers will increase the ebook price by an order of magnitude!</li>
<li>I can update the books to fix errors, and have the updated book ready for people to buy within hours, and usually within 24 hours.</li>
<li>As an author who has spent lots of my own time and effort on this, I get a much fairer deal with Amazon than with traditional publishers.</li>
</ul>
<br />
However, I have agreed to other language translations of the book to be handled by publishers. So far, the book is on course to be published in:<br />
<br />
<ol>
<li><b>German</b></li>
<li><b>Chinese</b></li>
<li><b>Russian</b></li>
<li><b>Japanese</b></li>
<li><b>Korean</b></li>
</ol>
<br />
I love the "traditional animal" that O'Reilly have done for the<a href="https://www.oreilly.de/buecher/12892/9783960090434-neuronale-netze-selbst-programmieren.html"> German version</a>:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuUsey1HwEF783Z5rMSnEpsAmg9m4D0Gz8accMKEmun7bDszM6IbpUNaZzNKSMhq_-x7X5RdqHH76Yzd0HrQNZydYW_uIwS0HuvNsypZuFf1Xsiufevs_EdlzxaD5BeBeBgNh3aUyv4OeF/s1600/myonn_oreilly.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuUsey1HwEF783Z5rMSnEpsAmg9m4D0Gz8accMKEmun7bDszM6IbpUNaZzNKSMhq_-x7X5RdqHH76Yzd0HrQNZydYW_uIwS0HuvNsypZuFf1Xsiufevs_EdlzxaD5BeBeBgNh3aUyv4OeF/s640/myonn_oreilly.png" width="436" /></a></div>
<br />
I'm looking forward to more translations - personally I wish there was a <b>Spanish</b> and <b>Italian</b> one too.MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-14270743968170271092017-01-07T15:41:00.001-08:002017-01-07T15:41:07.249-08:00Neural Networks on a Raspberry Pi Zero - UpdatedThe Raspberry Pi default operating system <a href="https://www.raspberrypi.org/downloads/raspbian/">Raspian</a> has seen signifcant <a href="https://www.raspberrypi.org/blog/introducing-pixel/">updates</a> since we <a href="http://makeyourownneuralnetwork.blogspot.co.uk/2016/03/ipython-neural-networks-on-raspberry-pi.html">last looked</a> at getting IPython notebooks and our neural networks to work on the Raspberry Pi Zero ... for example:<br />
<br />
<ul>
<li>the base Raspian operating system is now based on the next major Debian version called Jessie</li>
<li>some of the installation instructions can now be simpler</li>
<li>some of the new technology causes new problems to work around</li>
</ul>
<br />
.. so we've updated the guide. Here it is...<br />
<br />
<hr />
<b id="docs-internal-guid-910853c3-7b49-24fb-5118-668aa3c912d4" style="font-weight: normal;"></b><br />
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<b id="docs-internal-guid-910853c3-7b49-24fb-5118-668aa3c912d4" style="font-weight: normal;"><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">In this section we will aim to get IPython set up on a Raspberry Pi.</span></b></div>
<b id="docs-internal-guid-910853c3-7b49-24fb-5118-668aa3c912d4" style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<b id="docs-internal-guid-910853c3-7b49-24fb-5118-668aa3c912d4" style="font-weight: normal;"><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">There are several good reasons for doing this:</span></b></div>
<b id="docs-internal-guid-910853c3-7b49-24fb-5118-668aa3c912d4" style="font-weight: normal;">
</b><br />
<ul style="margin-bottom: 0pt; margin-top: 0pt;"><b id="docs-internal-guid-910853c3-7b49-24fb-5118-668aa3c912d4" style="font-weight: normal;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Raspberry Pis are fairly </span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">inexpensive</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> and </span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">accessible</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> to many more people than expensive laptops.</span></div>
</li>
</b></ul>
<b id="docs-internal-guid-910853c3-7b49-24fb-5118-668aa3c912d4" style="font-weight: normal;">
<br /><ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Raspberry Pis are very open - they run the </span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">free</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> and </span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">open source</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> Linux operating system, together with lots of free and open source software, including Python. Open source is important because it is important to understand how things work, to be able to share your work and enable others to build on your work. Education should be about learning how things work, and making your own, and not be about learning to buy closed proprietary software.</span></div>
</li>
</ul>
<br /><ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">For these and other reasons, they are wildly popular in schools and at home for children who are learning about computing, whether it is software or building hardware projects.</span></div>
</li>
</ul>
<br /><ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Raspberry Pis are not as powerful as expensive computers and laptops. So it is an interesting and worthy challenge to be prove that you can still implement a useful neural network with Python on a Raspberry Pi.</span></div>
</li>
</ul>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">I will use a </span><a href="https://www.raspberrypi.org/blog/raspberry-pi-zero/" style="text-decoration: none;"><span style="background-color: transparent; color: #1155cc; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">Raspberry Pi Zero</span></a><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> because it is even cheaper and smaller than the normal Raspberry Pis, and the challenge to get a neural network running is even more worthy! It costs about £4 UK pounds, or $5 US dollars. That wasn’t a typo!</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Here’s mine, shown next to a 2 penny coin. It’s tiny!</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="rasppi_zero.jpeg" height="389" src="https://lh3.googleusercontent.com/LZ4sAn65lmAnH4q6vUnWj6Mq_OObfN2N5Zh0pFVBE38XLIwmdBaQNtv8EhBwS6BaC-x93CVcMsp9-X4X7Yhne9KFfjlHhG8Nk8P9EJloP516pzEAsxSBD0tzcQV6rvKbp6_jcoO4" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /></span></div>
<br /><br /><h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: #351c75; font-family: "arial"; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Installing IPython</span></h2>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">We’ll assume you have a Raspberry Pi powered up and a keyboard, mouse, display and access to the internet working. </span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">There are several options for an operating system, but we’ll stick with the most popular which is the officially supported </span><a href="https://www.raspberrypi.org/downloads/raspbian/" style="text-decoration: none;"><span style="background-color: transparent; color: #1155cc; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">Raspian</span></a><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">, a version of the popular Debian Linux distribution designed to work well with Raspberry Pis. Your Raspberry Pi probably came with it already installed. If not install it using the instructions at that link. You can even buy an SD memory card with it already installed, if you’re not confident about installing operating systems.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">This is the desktop you should see when you start up your Raspberry Pi. I’ve removed the desktop background image as it’s a little distracting.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="rasppi_0.png" height="391" src="https://lh3.googleusercontent.com/Qcx2cV3p4z94emUGYSFx3HjnjMK7ApOicf2LJbmKc4KTTv0kJuoWk4O79tmu8Rg3-xdHZ70sLiOB7jyDOhYoYXZ1DDEUkFaCJbboBlBtxrXEDLDR3CHgb1TEMgLq2i6P8Xil6sl7" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /></span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">You can see the menu button clearly at the top left, and some shortcuts along the top too. </span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">We’re going to install IPython so we can work with the more friendly notebooks through a web browser, and not have to worry about source code files and command lines.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">To get IPython we do need to work with the command line, but we only need to do this once, and the recipe is really simple and easy.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Open the Terminal application, which is the icon shortcut at the top which looks like a black monitor. If you hover over it, it’ll tell you it is the Terminal. When you run it, you’ll be presented with a black box, into which you type commands, looking like the this.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="rasppi_1.png" height="391" src="https://lh4.googleusercontent.com/uMOo15GqUb43zB9iLJIE_d5pTkz3TN_jne_NzbN9x-vFQZ-AMefb-MO9O5Key6Yft9gKt7pywnLVZ9Hm73YaJ8LYn_7P0GfQW-7hFB9sViFPxKEwpf0LNNEXixuvjCMwajM7LaU0" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /></span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Your Raspberry Pi is very good because it won’t allow normal users to issue commands that make deep changes. You have to assume special privileges. Type the following into the terminal:</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">sudo su -</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">You should see the prompt end in with a ‘#’ hash character. It was previously a ‘$’ dollar sign. That shows you now have special privileges and you should be a little careful what you type. </span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The following commands refresh your Raspberry’s list of current software, and then update the ones you’ve got installed, pulling in any additional software if it’s needed.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">apt-get update</span></div>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">apt-get dist-upgrade</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Unless you already refreshed your software recently, there will likely be software that needs to be updated. You’ll see quite a lot of text fly by. You can safely ignore it. You may be prompted to confirm the update by pressing “y”. </span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Now that our Raspberry is all fresh and up to date, issue the command to get IPython. Note that, at the time of writing, the Raspian software packages don’t contain a sufficiently recent version of IPython to work with the notebooks we created earlier and put on github for anyone to view and download. If they did, we would simply issue a simple “apt-get install ipython3 ipython3-notebook” or something like that. </span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">If you don’t want to run those notebooks from github, you can happily use the slightly older IPython and notebook versions that come from Raspberry Pi’s software repository. </span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">If we do want to run more recent IPython and notebook software, we need to use some “pip” commands in additional to the “apt-get” to get more recent software from the Python Package Index. The difference is that the software is managed by Python (pip), not by your operating system’s software manager (apt). The following commands should get everything you need.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">apt-get install python3-matplotlib</span></div>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">apt-get install python3-scipy</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">pip3 install jupyter</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">After a bit of text flying by, the job will be done. The speed will depend on your particular Raspberry Pi model, and your internet connection. The following shows my screen when I did this.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="rasp_pi2.png" height="391" src="https://lh5.googleusercontent.com/8oYMXb2_IbIS5dJu65TcSi6Zndw48RQSqbtwx6O-5OeGgEUYFNGOoZkJOvGO6Xv8_pAKNkZI6Dx0LrBGuAhydVuzpXqOkUDy9I04ZTdfldWy4kXo5t9_9ROq5HD6SqIprPwdbMoH" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /></span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The Raspberry Pi normally uses an memory card, called an SD card, just like the ones you might use in your digital camera. They don’t have as much space as a normal computer. Issue the following command to clean up the software packages that were downloaded in order to update your Raspberry Pi.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">apt-get clean</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Recent versions of Raspian replaced the Epiphany web browser with Chromium (an open source version of the popular Chrome browser). Epiphany is much lighter than the heavier Chromium and works better with the tiny Raspberry Pi Zero. To set it as the default browser to be used later for the IPython notebooks issue the following command:</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">update-alternatives --config x-www-browser</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">This will tell you what what the current default browser is, and asks you to set a new one if you want to. Select the number associated with Epiphany, and you’re done.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">That’s it, job done. Restart your Raspberry Pi in case there was a particularly deep change such as a change to the very core of your Raspberry Pi, like a kernel update. You can restart your Raspberry Pi by selecting the “Shutdown …” option from the main menu at the top left, and then choosing “Reboot”, as shown next.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="rasp_pi3.png" height="391" src="https://lh6.googleusercontent.com/ygdCHtOVRsloXa5jixtrSiHtrmvbv9fjNiZpmL40YRvCVZBxXwTfFyaEqtD_0jvxiMi-wfKhENM1u-Gqd4JXBfmGylr6FZoMPta3Csscw2T5GY2ztgbHAZgVsaHhExLkTyBhtSFK" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /></span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">After your Raspberry Pi has started up again, start IPython by issuing the following command from the Terminal:</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">jupyter-notebook</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">This will automatically launch a web browser with the usual IPython main page, from where you can create new IPython notebooks. Jupyter is the new software for running notebooks. Previously you would have used the “ipython3 notebook” command, which will continue to work for a transition period. The following shows the main IPython starting page.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="rasp-pi4.png" height="391" src="https://lh5.googleusercontent.com/PbB9Oby6wDV6VcGhawdww-jnmdRmHTCwfS1qOQm1IdqkDasLbup_thHGteE8Az5VKhbohGdIr1DEha_kRGxP7h2FryvK-4LrvCdhs5ep-aoAdsaUMBR0VClj3xqGz0pUNKSfhdC-" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /></span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">That’s great! So we’ve got IPython up and running on a Raspberry Pi.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">You could proceed as normal and create your own IPython notebooks, but we’ll demonstrate that the code we developed in this guide does run. We’ll get the notebooks and also the MNIST dataset of handwritten numbers from github. In a new browser tab go to the link:</span></div>
<br /><ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<a href="https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork" style="text-decoration: none;"><span style="background-color: transparent; color: #1155cc; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork</span></a></div>
</li>
</ul>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">You’ll see the github project page, as shown next. Get the files by clicking “Download ZIP” after clicking “Clone or download” at the top right.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="rasp-pi5.png" height="391" src="https://lh5.googleusercontent.com/-x2jIpuZ7Pr4MZp1gOC_gNY9u42Myf18pNe4gMB6mV-ebB8PFdUlv0jTaI0IVuuboyC7KhVbTBwrhv8hSE0ZPbOUD74V1VNEFIrp2aMr5AF1lHHwPviXrbrglhaRb6axhK_Setfe" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /></span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">If github doesn’t like Epiphany, then enter the following into your browser to download the files:</span></div>
<br /><ul style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<a href="https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork/archive/master.zip" style="text-decoration: none;"><span style="background-color: transparent; color: #1155cc; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork/archive/master.zip</span></a><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> </span></div>
</li>
</ul>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The browser will tell you when the download has finished. Open up a new Terminal and issue the following command to unpack the files, and then delete the zip package to clear space.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">unzip Downloads/makeyourownneuralnetwork-master.zip</span></div>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">rm -f Downloads/makeyourownneuralnetwork-master.zip</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The files will be unpacked into a directory called makeyourownneuralnetwork-master. Feel free to rename it to a shorter name if you like, but it isn’t necessary.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The github site only contains the smaller versions of the MNIST data, because the site won’t allow very large files to be hosted there. To get the full set, issue the following commands in that same terminal to navigate to the mnist_dataset directory and then get the full training and test datasets in CSV format.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">cd makeyourownneuralnetwork-master/mnist_dataset</span></div>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">wget -c </span><a href="http://pjreddie.com/media/files/mnist_train.csv" style="text-decoration: none;"><span style="background-color: #fce5cd; color: #1155cc; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">http://pjreddie.com/media/files/mnist_train.csv</span></a></div>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;">
<span style="background-color: #fce5cd; color: #351c75; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">wget -c </span><a href="http://pjreddie.com/media/files/mnist_test.csv" style="text-decoration: none;"><span style="background-color: #fce5cd; color: #1155cc; font-family: "courier new"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">http://pjreddie.com/media/files/mnist_test.csv</span></a></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The downloading may take some time depending on your internet connection, and the specific model of your Raspberry Pi. </span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">You’ve now got all the IPython notebooks and MNIST data you need. Close the terminal, but not the other one that launched IPython.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Go back to the web browser with the IPython starting page, and you’ll now see the new folder makeyourownneuralnetwork-master showing on the list. Click on it to go inside. You should be able to open any of the notebooks just as you would on any other computer. The following shows the notebooks in that folder.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="rasp-pi6.png" height="391" src="https://lh6.googleusercontent.com/2aVkGg6izkSWcXRnxpXoDSb5xetPYKGvdvlipBwwPYrnL-JPVU76mUDObMaTzg-OXjzotLpHq1RA7pECDOz2gPDzENwJdixcyIysXG4RNj7X9yAhJsLSdeCvJtKORwelTzEkcsqx" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /></span></div>
<br /><br /><h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: #351c75; font-family: "arial"; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Making Sure Things Work</span></h2>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Before we train and test a neural network, let’s first check that the various bits, like reading files and displaying images, are working. Let’s open the notebook called “part3_mnist_data_set_with_rotations.ipynb” which does these tasks. You should see the notebook open and ready to run as follows.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="rasp-pi7.png" height="391" src="https://lh3.googleusercontent.com/lehW0K8H6G_JCK2ojIfPQ5i77-401prwPRI8WaW_W3MaQ241_AImcaEpD-WlmHYakZojFsVAq0zIIZie5cM2g5KPSgKtowrYZt5mCYImR6HZOsNbvRVbLqFzfWM73UMQNxTjedwQ" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /></span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">From the “Cell” menu select “Run All” to run all the instructions in the notebook. After a while, and it will take longer than a modern laptop, you should get some images of rotated numbers.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="rasp-pi8.png" height="391" src="https://lh5.googleusercontent.com/HVlbsrvFtNHZjmRYBhMoV70WzlzMfMTEbKvg6VbfZSqdCPY23drsMSevF4pzLsVhXPYTIWfXWOSSnYQtqPlBneMRvqpbd2zufgUjPfcHEWetqEZgPagAWn25JtiTENzTmToejMfC" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /></span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">That shows several things worked, including loading the data from a file, importing the Python extension modules for working with arrays and images, and plotting graphics.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Let’s now “Close and Halt” that notebook from the File menu. You should close notebooks this way, rather than simply closing the browser tab.</span></div>
<br /><br /><h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: #351c75; font-family: "arial"; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Training And Testing A Neural Network</span></h2>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Now let’s try training a neural network. Open the notebook called “part2_neural_network_mnist_data”. That’s the version of our program that is fairly basic and doesn’t do anything fancy like rotating images. Because our Raspberry Pi is much slower than a typical laptop, we’ll turn down some of parameters to reduce the amount of calculations needed, so that we can be sure the code works without wasting hours and finding that it doesn’t.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">I’ve reduced the number of hidden nodes to 10, and the number of epochs to 1. I’ve still used the full MNIST training and test datasets, not the smaller subsets we created earlier. Set it running with “Run All” from the “Cell” menu. And then we wait ...</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Normally this would take about one minute on my laptop, but this completed in about </span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">25 minutes</span><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. That's not too slow at all, considering this Raspberry Pi Zero costs 400 times less than my laptop. I was expecting it to take all night.</span></div>
<br /><div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="rasp_pi9.png" height="391" src="https://lh4.googleusercontent.com/sZ0xNuv0gww8VoZZaQaN23O9Zo9-4lbGM9W7xcVnNCg8bWOByCKXezRqyGXuA4HLUCD3z0s7LdoQ474srnL2YeQfQD1SlJnIYfZsI3g4xW4bMsnKXg3LQ_7DHF9LxxC7h8ir-_l1" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /></span></div>
<br /><br /><h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: #351c75; font-family: "arial"; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Raspberry Pi Success!</span></h2>
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">We’ve just proven that even with a £4 or $5 Raspberry Pi Zero, you can still work fully with IPython notebooks and create code to train and test neural networks - it just runs a little slower!</span></div>
</b>MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-92220162981157220252017-01-01T10:08:00.000-08:002017-01-01T15:45:32.238-08:00Errata #4 .. Lots of Updates<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script>
I've been lucky to have readers that tale the time to provide feedback, error fixes, and suggestions for things that could be made clearer.<br />
<br />
I am really pleased that this happens - it means people are interested, that they care, and want to share their insights.<br />
<br />
A few suggestions had built up over recent weeks - and I've updated the content. This is is a bigger update than normal.<br />
<br />
<br />
<h3>
Thanks</h3>
Thanks go to Prof A Abu-Hanna, "His Divine Shadow", Andy, Joshua, Luther, ... and many others who provided valuable ideas and fixes for errors, including in the blog comments sections.<br />
<br />
<br />
<h3>
Key Updates</h3>
Some of the key updates worth mentioning are:<br />
<ul>
<li>Error in calculus introduction appendix where the example explaining how to differentiate $s = t^3$. The second line of working out on page 204 shows $\frac{6 t^2 \Delta x + 4 \Delta x^3}{2\Delta x}$ which should be $\frac{6 t^2 \Delta x + 2 \Delta x^3}{2\Delta x}$. That 4 should be a 2.</li>
<li>Another error in the calculus appendix section on functions of functions ... showed $(x^2 +x)$ which should have been $(x^3 + x)$. </li>
<li>Small error on page 65 where $w_{3,1}$ is said to be 0.1 when it should be 0.4. </li>
<li>Page 99 shows the summarised update expression as $\Delta{w_{jk}} = \alpha \cdot sigmoid(O_k) \cdot (1 - sigmoid(O_k)) \cdot O_j^T$ .. it should have been the much simpler ..</li>
</ul>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSVaoXjDI6bL7VpLhHM8OE3zd8t4vI4W_MQXpKeRhqvmaiD64Q1YV62cxpQR0rf2ZFzCJ1OSOM_Oss7t8McCI7wBXmuHpA_bhoGk8WYu-OTYfn-fgdpKlk7oi2NZY2fe8FmNnbuJNchx-P/s1600/formula_11_cc.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="176" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSVaoXjDI6bL7VpLhHM8OE3zd8t4vI4W_MQXpKeRhqvmaiD64Q1YV62cxpQR0rf2ZFzCJ1OSOM_Oss7t8McCI7wBXmuHpA_bhoGk8WYu-OTYfn-fgdpKlk7oi2NZY2fe8FmNnbuJNchx-P/s640/formula_11_cc.png" width="640" /></a></div>
<br />
<ul>
</ul>
<br />
<br />
<h3>
Worked Examples Using Output Errors - Updated!</h3>
A few readers noticed that the example error used in the example to illustrate the weight update process is not realistic. <br />
<br />
Why? How? Here is an example diagram used in the book - click to enlarge.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbIaNOCEMLoQuw4LW9MkDo5-0_9kZppecPxH1lmYSjjA-vMWxQFMkgRaMcqAewr57hw2bx0EI764IsKq0lrVSl8GgCFxvkInpBnAb0GY63RKC4Ksqcxtl6AjJA32liuKgaBrXyJHMd1C1y/s1600/backprop_error_example.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="292" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbIaNOCEMLoQuw4LW9MkDo5-0_9kZppecPxH1lmYSjjA-vMWxQFMkgRaMcqAewr57hw2bx0EI764IsKq0lrVSl8GgCFxvkInpBnAb0GY63RKC4Ksqcxtl6AjJA32liuKgaBrXyJHMd1C1y/s640/backprop_error_example.png" width="640" /></a></div>
<br />
The output error from the first output layer node (top right) is shown as <b>1.5.</b> Since the output of that node is the output from a sigmoid function it must be between 0 and 1 (and not including 0 or 1). The target values must also be within this range. That means the error .. the difference between actual and target values .. can't be as large as 1.5. The error can't be bigger than 0.99999... at the very worst. That's why $e_1 = 1.5$ is unrealistic.<br />
<br />
The calculations illustrating how we do backpropagation are still ok. The error values were chosen at random ... but it would be better if we had chosen a more realistic error. <br />
<br />
The examples in the book have been updated with a new output error as 0.8.<br />
<br />
<br />
<h3>
Updated Book</h3>
The book will be updated with these fixes as soon as the Appendix on how to run the neural networks and MNIST challenged on the <a href="http://makeyourownneuralnetwork.blogspot.co.uk/2016/03/ipython-neural-networks-on-raspberry-pi.html">Raspberry Pi Zero</a> is updated too - the Raspian software has seen quite a few updates and probably doesn't need the workarounds described there.<br />
<ul>
</ul>
MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-81593043349055378962016-08-02T07:26:00.003-07:002016-08-02T07:32:28.277-07:00Errata #3Brian spotted an arithmetic error in the <b>Weight Update Worked Example</b> section.<br />
<br />
One of the weights should have been 3.0 not 4.0, which then affects the rest of the calculations.<br />
<br />
Here is the corrected section below. The corrected error is highlighted, and this then flows onto the rest of the calculations.<br />
<br />
The books will be updated, and you can ask Amazon for a free ebook update if you have that version.<br />
<br />
<hr />
<br />
<h3>
Weight Update Worked Example</h3>
<h3>
</h3>
<div class="page" title="Page 99">
<div class="section" style="background-color: rgb(100.000000%, 100.000000%, 100.000000%);">
<div class="layoutArea">
<div class="column">
<span style="font-family: "arialmt"; font-size: 11.000000pt;">Let’s work through a couple of examples with numbers, just to see this weight update method
working. </span><br />
<br />
<span style="font-family: "arialmt"; font-size: 11.000000pt;">The following network is the one we worked with before, but this time we’ve added example
output values from the first hidden node </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">o</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">j=1</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;">and the second hidden node </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">o</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">j=2</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;">. These are just
made up numbers to illustrate the method and aren’t worked out properly by feeding forward
signals from the input layer.
</span></div>
</div>
</div>
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhge2GVsukCo7dTHpl5biliI0zNs_R3VkZ6AN0k71lUoZRTWRxuO3xLnzeTuECRix5hcNrQwxrup3zotV8y5wABnsDy6rUF9uTFGBcHfkJP2RDHFs2Hxjc-FnQP8lYJ15fCg4Lv-pVyUWi-/s1600/backprop_error_example_dw.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="182" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhge2GVsukCo7dTHpl5biliI0zNs_R3VkZ6AN0k71lUoZRTWRxuO3xLnzeTuECRix5hcNrQwxrup3zotV8y5wABnsDy6rUF9uTFGBcHfkJP2RDHFs2Hxjc-FnQP8lYJ15fCg4Lv-pVyUWi-/s400/backprop_error_example_dw.png" width="400" /></a></div>
<br />
<div class="page" title="Page 100">
<div class="section" style="background-color: rgb(100.000000%, 100.000000%, 100.000000%);">
<div class="layoutArea">
<div class="column">
<span style="font-family: "arialmt"; font-size: 11.000000pt;">We want to update the weight </span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">w</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">11</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;">between the hidden and output layers, which currently has
the value 2.0.
</span><br />
<span style="font-family: "arialmt"; font-size: 11.000000pt;">Let’s write out the error slope again. </span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjg2L2ZmyHBaQN7n_2_YHbMV7I9TawvjrVsFSHAgSbYfBwxFqy6Ikzf9EEHA_grWXnuMtrTmwd_VTK5f4kIqkIf9Z4hGHkQ_8clZT-NSQTxRBAfzsSa8q3uMOHuUsNb4hPISt8baIfqRIbd/s1600/formula_11.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="82" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjg2L2ZmyHBaQN7n_2_YHbMV7I9TawvjrVsFSHAgSbYfBwxFqy6Ikzf9EEHA_grWXnuMtrTmwd_VTK5f4kIqkIf9Z4hGHkQ_8clZT-NSQTxRBAfzsSa8q3uMOHuUsNb4hPISt8baIfqRIbd/s400/formula_11.png" width="400" /></a></div>
<br />
<span style="font-family: "arialmt"; font-size: 11.000000pt;">
</span><br />
<div class="page" title="Page 100">
<div class="section" style="background-color: rgb(100.000000%, 100.000000%, 100.000000%);">
<div class="layoutArea">
<div class="column">
<span style="font-family: "arialmt"; font-size: 11.000000pt;">Let’s do this bit by bit:
</span><br />
<ul style="list-style-type: none;">
<li>
<span style="font-family: "arialmt"; font-size: 11.000000pt;">● The first bit ( </span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">t</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">k</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;"> </span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">o</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">k</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;">) is the error </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">e</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">1</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;">= 1.5, just as we saw before.
</span><br />
</li>
<li>
<span style="font-family: "arialmt"; font-size: 11.000000pt;">● The sum inside the sigmoid functions </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">Σ</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">j</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">w</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">jk</span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">o</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">j</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;">is (2.0 * 0.4) + (<span style="background-color: yellow;">3.0</span> * 0.5) = 2.3.
</span><br />
</li>
<li>
<span style="font-family: "arialmt"; font-size: 11.000000pt;">● The sigmoid 1/(1 + e</span><span style="font-family: "arialmt"; font-size: 7.000000pt; vertical-align: 5.000000pt;"></span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 7.000000pt; vertical-align: 5.000000pt;">2.3</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: 5.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;">) is then 0.909. That middle expression is then 0.909 * (1 0.909)
= 0.083.
</span><br />
</li>
<li>
<span style="font-family: "arialmt"; font-size: 11.000000pt;">● The last part is simply </span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">o</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">j</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;">which is </span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">o</span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">j</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">=1</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;">because we’re interested in the weight </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">w</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">11</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;">where </span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">j
</span><span style="font-family: "arialmt"; font-size: 11.000000pt;">= 1. Here it is simply 0.4.</span><span style="font-family: "arialmt"; font-size: 11.000000pt;"> </span></li>
</ul>
</div>
</div>
<div class="layoutArea">
<div class="column">
<span style="color: rgb(71.800000% , 71.800000% , 71.800000%); font-family: "arialmt"; font-size: 11.000000pt;">
</span>
</div>
</div>
</div>
</div>
<div class="page" title="Page 101">
<div class="section" style="background-color: rgb(100.000000%, 100.000000%, 100.000000%);">
<div class="layoutArea">
<div class="column">
<span style="font-family: "arialmt"; font-size: 11.000000pt;"><span style="font-family: "arialmt"; font-size: 11.000000pt;">Multiplying all these three bits together and not forgetting the minus sign at the start gives us
0.04969.
</span></span><br />
<br />
<span style="font-family: "arialmt"; font-size: 11.000000pt;">If we have a learning rate of 0.1 that give is a change of (0.1 * 0.04969) = + 0.005. So the
new </span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 11.000000pt; font-weight: 700;">w</span><span style="font-family: "gautami"; font-size: 11.000000pt;"> </span><span style="font-family: "arial"; font-size: 7.000000pt; font-weight: 700; vertical-align: -3.000000pt;">11</span><span style="font-family: "gautami"; font-size: 7.000000pt; vertical-align: -3.000000pt;"> </span><span style="font-family: "arialmt"; font-size: 11.000000pt;">is the original 2.0 plus 0.005 = 2.005. </span><br />
<br />
<span style="font-family: "arialmt"; font-size: 11.000000pt;">This is quite a small change, but over many hundreds or thousands of iterations the weights will
eventually settle down to a configuration so that the well trained neural network produces
outputs that reflect the training examples.
</span></div>
</div>
</div>
</div>
<br /></div>
</div>
</div>
</div>
MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-46958463304951759132016-07-06T17:03:00.000-07:002016-07-07T02:27:15.606-07:00Error Backpropagation Revisted<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script>
A great question from <b>Alex J</b> has prompted a deeper look at how we take the error at the output layer of a neural network and propagate it back into the network.<br />
<br />
<br />
<h3>
Reminder: Error Informs Weight Updates</h3>
Here's a reminder why we care about the error:<br />
<ul>
<li>In a neural network, it is the link weights that do the learning. They are adjusted again and again in an attempt to better match the training data.</li>
<li>This refinement of the weights is informed by the error associated with a node in the network. A small error means we don't need to change the weights much. </li>
<li>The error at the output layer is easy - it's simply the difference between the desired target and actual output of the network.</li>
<li>However the error associated with internal hidden layer nodes is not obvious.</li>
</ul>
<br />
<br />
<h3>
What's The Error Inside The Network?</h3>
There isn't a mathematically perfect answer to this question.<br />
<br />
So we use approaches that make sense intuitively, even if there isn't a mathematically pure and precise derivation for them. These kinds of approaches are called <b>heuristics</b>.<br />
<br />
These "rule of thumb" heuristics are fine ... as long as they actually help the network learn!<br />
<br />
The following illustrates what we're trying to achieve - use the error at the output layer to work out, somehow, the error inside the network.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEig-HwgF75b-vbb9n0ug01cFhj7tt9xHV_sXWf4z5c4JfsRb2x5zfLGrz4rSsFJSKnMoBQKudRrtJcCdAvViRzYBeG7ESycaivzoOX_juxuL2CLR0V860vklivhRyOhRBqnDpJDMGqwEers/s1600/neruons_error_into_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="211" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEig-HwgF75b-vbb9n0ug01cFhj7tt9xHV_sXWf4z5c4JfsRb2x5zfLGrz4rSsFJSKnMoBQKudRrtJcCdAvViRzYBeG7ESycaivzoOX_juxuL2CLR0V860vklivhRyOhRBqnDpJDMGqwEers/s400/neruons_error_into_2.png" width="400" /></a></div>
<br />
<br />
Previously, and in the book, we considered three ideas. An extra one is added here:<br />
<ul>
<li>split the error <b>equally </b>amongst the connected nodes, recombine at the hidden node</li>
<li>split the error in <b>proportion </b>to the link weights, recombine at the hidden node</li>
<li><b>simply multiply</b> the error by the link weights, recombine at the hidden node</li>
<li>the same as above but attempt to <b>normalise </b>by dividing by the number of hidden nodes</li>
</ul>
<br />
Let's look at these in turn, before we try them to see what performance each approach gives us.<br />
<br />
<br />
<h3>
1. Split Error Equally Amongst Links</h3>
We split the error at each output node, dividing it equally amongst the number of connected incoming links. We then recombine these pieces at each hidden layer node to arrive at an internal error.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLpJC-zQBrxxDnVnew0nQQ5STdkLeSZF5CoVWGerN6Bnjb9sXeajf2np7frPuaK89E_ko755NGQvDdG6-e3bh2gd1-wb5JKOX8xeUBzHJRLJ7HDFrV6R8Y7oCfsHUQhJabNUnUCmFb2D6h/s1600/neruons_error_halved.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="211" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLpJC-zQBrxxDnVnew0nQQ5STdkLeSZF5CoVWGerN6Bnjb9sXeajf2np7frPuaK89E_ko755NGQvDdG6-e3bh2gd1-wb5JKOX8xeUBzHJRLJ7HDFrV6R8Y7oCfsHUQhJabNUnUCmFb2D6h/s400/neruons_error_halved.png" width="400" /></a></div>
<br />
Mathematically, and in matrix form, this looks like the following. $N$ is the number of links from hidden layer nodes into an output node - that is, the number of hidden layer nodes.<br />
<br />
$$<br />
e_{hidden} =<br />
\begin{pmatrix}<br />
1/N & 1/N & \cdots \\<br />
1/N & 1/N & \cdots \\<br />
\vdots & \vdots & \ddots \\<br />
\end{pmatrix}<br />
\cdot e_{output}<br />
$$<br />
<div>
<br />
Remember that a matrix form is really useful because Python's numpy can do the calculations efficiently (quickly) and we can write very concise code.<br />
<br />
<br />
<h3>
2. Split Error In Proportion To Link Weights</h3>
We split the error, not equally, but in proportion to the link weights. The reason for this is that those links with larger weights contributed more to the error at the output layer. That makes intuitive sense - small weights contribute smaller signals to the final output layer, and should be blamed less for the overall error. These proportional bits are recombined again at the hidden layer nodes.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIqRrMO9ABmmp6bSfHJoFtwTegf0L5vlkBhLBlb7MJN5XYTuo-6SX281oCLaG5pcVy9E_hfkqQ_UCuCf8n3qNarYVYKPPF6_f_uBaeByCYLwRh9Sg8ufyNCds1Yde_sZkeVImZO3utf2yN/s1600/neruons_error_proportionate.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="211" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIqRrMO9ABmmp6bSfHJoFtwTegf0L5vlkBhLBlb7MJN5XYTuo-6SX281oCLaG5pcVy9E_hfkqQ_UCuCf8n3qNarYVYKPPF6_f_uBaeByCYLwRh9Sg8ufyNCds1Yde_sZkeVImZO3utf2yN/s400/neruons_error_proportionate.png" width="400" /></a></div>
<br />
Again, in matrix form, this looks like the following.<br />
<br />
$$<br />
e_{hidden} =<br />
\begin{pmatrix}<br />
\frac{w_{11}}{w_{11} + w_{21} + \cdots} & \frac{w_{12}}{w_{12} + w_{22} + \cdots} & \cdots \\<br />
\frac{w_{21}}{w_{11} + w_{21} + \cdots} & \frac{w_{22}}{w_{12} + w_{22} + \cdots} & \cdots \\<br />
\vdots & \vdots & \ddots \\<br />
\end{pmatrix}<br />
\cdot e_{output}<br />
$$<br />
<div>
<br />
The problem is ... we can't easily write this as a simple combination of matrices we already have, like the weight matrix and the output error matrix. To code this, we'd lose the benefits of numpy being able to accelerate the calculations. Even so, let's try it to see how well it performs.<br />
<br />
<br />
<h3>
3. Error Simply Multiplied By Link Weights</h3>
We don't split the error, but simply multiply the error by the link weights. This is much simpler than the previous idea but retains the key intuition that larger weights contribute more to the networks error at the output layer.<br />
<br />
You can see from the expression above that the output errors are multiple day the weights, and there is also a kind of normalisation division. Here we don't have that normalisation.<br />
<br />
In matrix form this looks like the following - it is very simple!<br />
<br />
$$<br />
e_{hidden} = w^{T} \cdot e_{output}<br />
$$<br />
<br />
Let's try it - and if it works, we have a much simpler heuristic, and one that can be accelerated by numpy's ability to do matrix multiplications efficiently.<br />
<div>
<br /></div>
<div>
<br /></div>
<h3>
4. Same as Above But "Normalised"</h3>
<div>
This additional heuristic is the same as the previous very simple one - but with an attempt to apply some kind of normalisation. We want to see if the lack of a normalisation in the simple heuristic has a negative effect on performance. </div>
<div>
<br /></div>
<div>
The expression is still simple, the above expression divided by the number of hidden nodes $N$.</div>
<div>
<br /></div>
<div>
$$<br />
e_{hidden} = \frac{w^{T}}{N} \cdot e_{output}<br />
$$</div>
<br />
You can imagine this goes some way to allaying fears that the previous approach magnifies the error unduly. This fear goes away if you realise the weights can be $<1$ and so can have a shrinking effect, not just a growing effect.<br />
<br />
<br />
<h3>
Results!</h3>
The above heuristics were coded and compared using the MNIST challenge. We keep the number of hidden nodes at 100, and the learning rate at 0.1 We do vary he number of learning epochs over 1, 2 and 5.<br />
<br />
The following shows the results.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdISLQatF541gWggC1Q8x7KyVtibeBO6MvP3OPjREffyVY7r24_C1h62qEq_uJqIwrRSULKSXUG9EIa5S1F5J3WnAbnLLZXUQJkEeqEBQdBPPVgh_AkSCXBsJxmSnAV0jIQklufGjzbybR/s1600/perf_error_heuristics.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="251" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdISLQatF541gWggC1Q8x7KyVtibeBO6MvP3OPjREffyVY7r24_C1h62qEq_uJqIwrRSULKSXUG9EIa5S1F5J3WnAbnLLZXUQJkEeqEBQdBPPVgh_AkSCXBsJxmSnAV0jIQklufGjzbybR/s400/perf_error_heuristics.png" width="400" /></a></div>
<br />
We can make some interesting conclusions from these results.<br />
<br />
<br />
<h3>
Conclusions</h3>
<ul>
<li><b>Naively splitting the error equally among links doesn't work</b>. At all! The performance of 0.1 or 10% accuracy is what you'd get randomly choosing an answer from a possible 10 answers (the digits 0-9).</li>
<li><b>There is no real difference between the sophisticated error splitting and the much simpler multiplication by the link weights</b>. This is important - it means we can safely use the much simpler method and benefit from accelerated matrix multiplication.</li>
<li><b>Trying to normalise the simple method actually reduces performance</b> ... by slowing down the learning rate. You can see it recover as you increase the number of learning epochs.</li>
</ul>
<div>
<br /></div>
All this explains why we, and others, choose the simpler heuristic. It's simple, it works really well, and it can benefit from technology that accelerates matrix multiplication ... software like Python's numpy, and hardware like GPUs through openCL and CUDA.<br />
<br />
<hr />
<br />
<i>I'll update the book so readers can benefit from a better explanation of the choice of heuristic. All ebooks can be updated for free by asking Amazon Kindle support.</i><br />
<i><br /></i></div>
</div>
MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-12482105459381675252016-06-28T15:05:00.001-07:002016-06-29T01:55:29.482-07:00Bias Nodes in Neural Networks<script type="text/x-mathjax-config"> MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); </script> <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"> </script>
I've been asked about <b>bias nodes</b> in neural networks. What are they? Why are they useful?<br>
<br>
<br>
<h3>
Back to Basics</h3>
Before we dive into bias nodes .. let's go back to basics. Each node in a neural network applies a threshold function to the input. The output helps us make a decision about the inputs.<br>
<br>
We know the nodes in a real neural network are usually sigmoid in shape, with the $1/(1+e^{-x})$ logistic function and the $tanh()$ function also being popular.<br>
<br>
But before we arrived at those, we used a very simple linear function to understand how it could be used to classify or predict, and how it could be refined by adjusting its slope. So let's stick with linear functions for now - because they are simpler.<br>
<br>
The following is a simple linear function.<br>
<br>
$$y = A\cdot x$$<br>
<br>
You'll remember it was the parameter $A$ that we varied to get different classifications. And it was this parameter $A$ that we refined by learning from the error from each training example.<br>
<br>
The following diagram shows some examples of different lines possible with such a linear function.<br>
<br>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaNH7P7KgunkFAhYvTHysRNmsixRgbIBttsz15Db7o97GKexeS3jve_zTHJOu1Yn9wzCrn241AoesrTcIDR0Lny-rWHqK2Us-qMFPmOEyDNGMsw8y2TRKfV4EGc44JGyc9Nh92e6CcN4Do/s1600/bias_ax.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="272" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiaNH7P7KgunkFAhYvTHysRNmsixRgbIBttsz15Db7o97GKexeS3jve_zTHJOu1Yn9wzCrn241AoesrTcIDR0Lny-rWHqK2Us-qMFPmOEyDNGMsw8y2TRKfV4EGc44JGyc9Nh92e6CcN4Do/s400/bias_ax.png" width="400"></a></div>
<br>
You can see how some lines are better at separating the two clusters. In this case the line $y=2x$ is the best at separating the two clusters.<br>
<br>
That's all cool and happy - and stuff we've already covered before.<br>
<br>
<h3>
A Limitation</h3>
Look at the following digram and see which line of the form $y=A\cdot x$ would best separate the data.<br>
<br>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgreB3df3p7MAG-LZql8sz2q1dRUVt_jPxz9dwMAEhMted3Fc_X1dy3GYROtaF4QSc_FY561ktjupegDYW8WRif4XdbB98FnXlaVRWgSciwFqbStf1Pd7U-tHLrfvAjg37vQw5cOMxRye8G/s1600/bias_ax_b_data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="273" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgreB3df3p7MAG-LZql8sz2q1dRUVt_jPxz9dwMAEhMted3Fc_X1dy3GYROtaF4QSc_FY561ktjupegDYW8WRif4XdbB98FnXlaVRWgSciwFqbStf1Pd7U-tHLrfvAjg37vQw5cOMxRye8G/s400/bias_ax_b_data.png" width="400"></a></div>
<br>
Ouch! We can't seem to find a line that does the job - no matter what slope we choose.<br>
<br>
This is a limitation we've hit. Any line of the form $y= A\cdot x$ must go through the origin. You can see in the diagram all three example lines do.<br>
<br>
<br>
<h3>
More Freedom</h3>
What we need is to be able to shift the line up and down. We need an extra degree of freedom.<br>
<br>
The following diagram shows some example separator lines which have been liberated from the need to go through the origin.<br>
<br>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhx7U2cjosCZ0zQC9DNscGmVfwllwgp_K32HRYOXOBw4yfAx-lbo85x5XVFpdtOLIxVR6GNlrPnBx21_jQDe619JLVfGBGKt4L3-ZfrL4SmEMUo-FFFVU-uHBUQwxrBGhnYFvC9yJHezOjv/s1600/bias_ax_b_data_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="273" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhx7U2cjosCZ0zQC9DNscGmVfwllwgp_K32HRYOXOBw4yfAx-lbo85x5XVFpdtOLIxVR6GNlrPnBx21_jQDe619JLVfGBGKt4L3-ZfrL4SmEMUo-FFFVU-uHBUQwxrBGhnYFvC9yJHezOjv/s400/bias_ax_b_data_2.png" width="400"></a></div>
<br>
You can see one that does actually do a good job of separating the two data clusters.<br>
<br>
So what form do these liberated lines take? They take the following form:<br>
<br>
$$ y = A \cdot x + B $$<br>
<br>
We've added an extra $+B$ to the previous simpler equation $y = A\cdot x$. All this will be familiar to you if you've done maths at school.<br>
<br>
<br>
<h3>
Bias Node</h3>
So we've just found that for some problems, a simple linear classifier of the form $y=A\cdot x$ was insufficient to represent the training data. We needed an extra degree of freedom so the lines were freer to go all over the data. The full form of a linear function $y = A\cdot x + B$ does that.<br>
<br>
The same idea applies even when we're using sigmoid shaped functions in each neural network node. You can see that without a $+B$ those simpler functions are doomed to stick to a fixed origin point, and only their slope changes. You can see this in the following diagram.<br>
<br>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRRzfGPj8805F5-6NYjEPig82wvzfDi_ecg8lYkY4yt8GPCWNEfqpKg-1XdcsFzHFzwpN3S79f8hIhzJCWhsGXJ5wjM9C_sHHLc-D1RNJCMjJO0artC1RIsIb0IZ4BTN_3SwHofzKxY75v/s1600/logistic.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRRzfGPj8805F5-6NYjEPig82wvzfDi_ecg8lYkY4yt8GPCWNEfqpKg-1XdcsFzHFzwpN3S79f8hIhzJCWhsGXJ5wjM9C_sHHLc-D1RNJCMjJO0artC1RIsIb0IZ4BTN_3SwHofzKxY75v/s400/logistic.png" width="400"></a></div>
<br>
How do we represent this in a neural network?<br>
<br>
We could change the activation function in each node. But remember, we chose not to alter the slope of that function, never mind adding a constant. We instead chose to change the weights of the incoming signals.<br>
<br>
So we need to continue that approach. The way to do this is to add a special additional node into a layer, alongside the others, which always has a constant value usually set to 1. The weight of the link is able to change, and even become negative. This has the same effect of adding the additional degree of freedom that we needed above.<br>
<br>
The following illustrates the idea:<br>
<br>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh88w91a0ONOEAmaQtzrcEQ7JIM6g_KznZq6b2J_iPuXR6lYL7sLI3oJS7m_6bnvpnMx0JonwxvLgEcxH_nis4HsM0PBuC67nT-Irdz7jU-o1coqFacy5AFeblRgAqfFjvr3ls5WOXUB88h/s1600/bias_node.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="275" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh88w91a0ONOEAmaQtzrcEQ7JIM6g_KznZq6b2J_iPuXR6lYL7sLI3oJS7m_6bnvpnMx0JonwxvLgEcxH_nis4HsM0PBuC67nT-Irdz7jU-o1coqFacy5AFeblRgAqfFjvr3ls5WOXUB88h/s400/bias_node.png" width="400"></a></div>
<br>
The activation function is a sigmoid is of the combined incoming signals $w_0 + w_1\cdot x$. The $w_0$ is provided by the additional node and has the effect of shifting the function left or right along the x-axis. That in effect allows the function to escape being pinned to the "origin" which is $(0, \frac{1}{2})$ for the logistic function and $(0,0)$ for the $tanh()$.<br>
<br>
Don't forget that the $w_1$ can be negative too ... which allows the function to flip top to bottom too, allowing for lines which fall not just rise.<br>
<br>
The following shows how the extra node is included in a layer. That node is called a <b>bias node</b>.<br>
<br>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUZyg_jTEF_lUZOVXQffm-7vWqVanMoIARrOuFrv33ZIcGX5C44iwXl6d6cO_7FkU_XKYO-XKNas-Ts7WJShyzbtlWJMhpvvFOPesXyK8ml8jp88FYhXDEfajeEnNBSgGHfVTrPAwWBL0k/s1600/bias_node_in_net.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="287" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUZyg_jTEF_lUZOVXQffm-7vWqVanMoIARrOuFrv33ZIcGX5C44iwXl6d6cO_7FkU_XKYO-XKNas-Ts7WJShyzbtlWJMhpvvFOPesXyK8ml8jp88FYhXDEfajeEnNBSgGHfVTrPAwWBL0k/s400/bias_node_in_net.png" width="400"></a></div>
<br>
It is worth experimenting to determine whether you need a bias node to augment the input layer, or whether you also need one to augment the internal hidden layers. Clearly you don't have one on the output layer.<br>
<br>
<br>
<h3>
Coding A Bias Node</h3>
A bias node is simple to code. The following shows how we might add a bias node to the input layer, with code based on our examples in github.<br>
<br>
<br>
<ul>
<li>Make sure the weight matrix has the right shape by incrementing the number of input nodes, <b><span style="color: blue; font-family: "courier new" , "courier" , monospace;">self.inodes = input_nodes + 1</span></b>.</li>
</ul>
<ul>
<li>This automatically means that the weight matrix takes the right shape, <span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b>self.wih</b></span> depends on <span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b>self.inodes</b></span>.</li>
</ul>
<ul>
<li>In the <b><span style="color: blue; font-family: "courier new" , "courier" , monospace;">query()</span></b> and <span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b>train()</b></span> functions, the <span style="color: blue; font-family: "courier new" , "courier" , monospace;"><b>inputs_list</b></span> has a 1.0 bias constant input prepended or appended to it.</li>
</ul>
<br>
<br>
<br>
<h3>
Why Didn't We Use Bias?</h3>
Why didn't we use bias when we created a neural network to learn the MNIST data set?<br>
<br>
The primary aim of the book was to keep things as simple as possible and avoid additional details or optimisations as much as possible.<br>
<br>
The MNIST data challenge is one that happens not to need a bias node. Just like some cluster separation problems don't need the extra degree of freedom.<br>
<br>
Simple!MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-29762827247150132422016-06-10T13:53:00.000-07:002016-06-10T13:53:01.226-07:00Talk from PyData London 2016Here's my talk 'A Gentle Intro To Neutral Networks (with Python)' from Pydata London 2016.<br />
<br />
<br />
<div style="text-align: center;">
<iframe allowfullscreen="" frameborder="0" height="240" src="https://www.youtube.com/embed/2sevic5Vy4E" width="427"></iframe>
</div>
<br />
<br />
It was a great event, meeting old friends, making new ones, and learning lots too ... all with a great grass-roots community vibe.<br />
<br />
Here's the <a href="https://www.youtube.com/channel/UCOjD18EJYcsBog4IozkF_7w">youtube PyData channel</a> for the rest of the talks .. a real treasure trove!MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-54496130943566093712016-05-24T15:57:00.000-07:002016-09-17T07:38:56.457-07:00Complex Valued Neural Networks - Experiments<script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js" type="text/javascript">
MathJax.Hub.Config({
HTML: ["input/TeX","output/HTML-CSS"],
TeX: { extensions: ["AMSmath.js","AMSsymbols.js"],
equationNumbers: { autoNumber: "AMS" } },
extensions: ["tex2jax.js"],
jax: ["input/TeX","output/HTML-CSS"],
tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
processEscapes: true },
"HTML-CSS": { availableFonts: ["TeX"],
linebreaks: { automatic: true } }
});
</script>
<span style="color: #6aa84f;"><i>
Update: the link between the phase rotated by these neurons and frequency components of an image is not clear. Needs more work ...</i></span><br />
<br />
<hr />
<br />
There are many ways to change the popular model of neural networks to see if we can improve how they work.<br />
<br />
For example, we could change the activation function, or how nodes are connected to each other, or try different error functions. These things are being done fairly often and aren't considered that radical.<br />
<br />
The following talks about a much deeper, more fundamental change.<br />
<br />
<br />
<h3>
1. From Real Numbers to Complex Numbers</h3>
Complex numbers are a richer set of numbers than the normal real numbers that we predominantly use in neural networks.<br />
<br />
They have a higher dimensionality which should allow much more complicated relationships to be learned by a neural network.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSpozEpZ5BVIgosfMuVgmz2jDThbU_S8NMpv9OA-B9GKOcB19NQAs30ohIf-tElDRFkXT70aPBg4Rc6iw6zQYcBFbX0UoElifjuGpme55MXJeFE545gnaFOtrSpHr2Fj4rDNQ9zryzsZgy/s1600/complex-2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="203" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSpozEpZ5BVIgosfMuVgmz2jDThbU_S8NMpv9OA-B9GKOcB19NQAs30ohIf-tElDRFkXT70aPBg4Rc6iw6zQYcBFbX0UoElifjuGpme55MXJeFE545gnaFOtrSpHr2Fj4rDNQ9zryzsZgy/s400/complex-2.png" width="400" /></a></div>
<br />
We know that often have to recast a problem into a higher dimensional space in order for a learning method to work - such as projecting 1-dimensional XOR data into a higher dimensional space so that a linear, but higher-dimensional, threshold can partition the data.<br />
<br />
Is it enough to simply replace the use of real values with complex values, and keep everything else the same - the same activation function, the same error function, the same system of link weights, etc? Hmmmm ...<br />
<br />
<br />
<h3>
2. Complex Valued Link Weights</h3>
The first idea of a thing to upgrade from normal real numbers to complex values is the link weights.<br />
<br />
That itself is actually a massive change because now signals are not just amplified or diminished as they pass through a network, they can now be <b>rotated</b> too.<br />
<br />
This is a significant step change in the richness of what a neural network can do to the signals - some call this <b>higher functionality</b>. It should lead to a richer ability to learn more complex relationships in training data.<br />
<br />
Why rotation? Because multiplying by a complex number doesn't just change a value's magnitude, it can also change the direction ... a rotation. A simple example is multiplying by (0+1j) rotates a value anti-clockwise by 90-degrees ($\pi /2$ radians).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvdWgzOm-1QTzGoGRlaP7wKLs7ILqcvppIfJFLeN2FoyvJ5frRN5VXZ9fRTHkAsMGuBOULlrXXcr1nkSoBtji2OmoYV5jGOsGDbdh0243lA7Ph_vTxAuGjWcZ7hE6cj4fkIb_G2fbvhK3d/s1600/complex_weights-2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="275" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvdWgzOm-1QTzGoGRlaP7wKLs7ILqcvppIfJFLeN2FoyvJ5frRN5VXZ9fRTHkAsMGuBOULlrXXcr1nkSoBtji2OmoYV5jGOsGDbdh0243lA7Ph_vTxAuGjWcZ7hE6cj4fkIb_G2fbvhK3d/s400/complex_weights-2.png" width="400" /></a></div>
<br />
If we're now processing complex values, we need to think again about the nodes too. Do they need to change as well?<br />
<br />
<br />
<h3>
3. Complex Neural Nodes</h3>
Traditional neural network nodes do two things. They sum up the incoming signals, moderated by the link weights, and they then use an activation function to produce an output signal. That activation function has historically been S-shaped or step-shaped to reflect how we thought biological neurons worked.<br />
<br />
We could keep that as it is. And some researchers have tried that - trying to use the logistic function $1/(1+e^{-x})$, or the $tanh()$ function, with complex inputs. Some problems arise from this, though. The calculations are not easy. The gradient isn't trivial to calculate to do gradient descent, if at all <span style="color: #999999; font-size: x-small;">(not an expert but iirc you can't differentiate it wrt complex values)</span>. There isn't a great fit between rotating signals and an activation threshold function that assumed a simple incoming signal whose magnitude was the only thing of importance.<br />
<br />
So let's be radical and try something very different.<br />
<br />
Let's force the signal to have a magnitude of 1, but allow it to rotate. That means we are working with signals that only sit on the unit circle in the complex domain. In essence we've discarded the magnitude and kept the phase.<br />
<br />
This may seem radical, but isn't really when you think about it. We hope the benefits of the complex domain are in the ability for signals to rotate around. Absolute magnitude was never that important anyway in traditional neural networks as we routinely normalised signals, and it was the relative magnitude at the output layer that was used for eventually deciding on a category or class. Remember the logistic and $tanh()$ activations squish the signal's magnitude into a specific range, irrespective of the incoming magnitude.<br />
<br />
So what does a complex node's activation function do? It doesn't really need to do anything more that what we've just described. The incoming signals have already been rotated by the complex weights, so all that remains is to make sure the sum is rescaled back to a unit circle.<br />
<br />
That's it!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEictksRCHTM_u_sSUdc7U7KVa-LK8Chnf7cmIWOyV7xdHdUmqGI8iB8fQnL1CQ3EyqCqIiB02_uwL0xM9UqBZrce9VET2_KRraoPxq2d_G35pvuogU_ZtyJrDpuVFsgzg8hBQRBL8yp2DyI/s1600/activation_function-2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="258" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEictksRCHTM_u_sSUdc7U7KVa-LK8Chnf7cmIWOyV7xdHdUmqGI8iB8fQnL1CQ3EyqCqIiB02_uwL0xM9UqBZrce9VET2_KRraoPxq2d_G35pvuogU_ZtyJrDpuVFsgzg8hBQRBL8yp2DyI/s400/activation_function-2.png" width="400" /></a></div>
<br />
Here's what we've just said, in mathematical notation:<br />
<br />
1. Combine incoming signals to a node $$ z = \sum_{n}{w_n \cdot x_n} $$
<br />
2. Rescale back to the unit circle $$ P(z) = \frac{z}{|z|} = e^{j * arg(z)} $$<br />
<br />
<br />
<h3>
Preparing Inputs, Mapping Outputs</h3>
Ok so in our mind we've designed a machine which kinda has lots of cogs in it, rotating the signals as they pass through. And we've said that we're constraining these signals to a magnitude of 1, that is, they're always on a unit circle.<br />
<br />
How does this work with inputs that might be from the real world, and not complex numbers on a unit circle? And what about the answers we want from a neural network? We might want a network to give us answers that are larger real numbers, or maybe even classification categories with names like "dog, cat, mouse".<br />
<br />
From our work on traditional neural networks, we've already seen the need to prepare inputs, and also to map the outputs.<br />
<br />
<b><span style="color: #990000;">Inputs</span></b><br />
<div>
Inputs need to map to the complex unit circle. Why? Imagine we had a super-simple network of only one node. That node takes the input and needs to map that to an answer. To do this, it needs to do something to that data, a transformation, the application of a function. To give the node the best chance of learning to do this, the incoming data should be as best spread out over that transformation's domain space as possible. We know that relatively low variance can hinder learning.<br />
<br />
Input values that run only along the real axis (imaginary part is zero) have only 2 possible phases - 0 or $\pi$ radians. For a network, or indeed a node, that tries to work with phases in order to make a decision, this is extremely limiting. So we need to map the inputs to the unit circle and cover a good range of phases. How we do this depends on the specifics of a particular problem, but a good starting point is to linearly remap the minimum of the input values to $e^{j \cdot \theta=0}$, and the maximum to $e^{j \cdot \theta=2\pi}$.</div>
<div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjY55JizM_DazDyXm1vNeCR9mFPtUAfCKJtN_0LyfGGh4ZFWe91r4VD8KdDU-4Xz5nQ3TWoc4kOba-nnPNKGYgEWUn42zFrlnlAboM0NhykDCSnLX20vfqMA7tb7G2jTUBPIk3hp_czvjG6/s1600/map_inputs-2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjY55JizM_DazDyXm1vNeCR9mFPtUAfCKJtN_0LyfGGh4ZFWe91r4VD8KdDU-4Xz5nQ3TWoc4kOba-nnPNKGYgEWUn42zFrlnlAboM0NhykDCSnLX20vfqMA7tb7G2jTUBPIk3hp_czvjG6/s400/map_inputs-2.png" width="400" /></a></div>
<br /></div>
<div>
In some scenarios the input values shouldn't wrap around. That is, if we had categories like the letters A to Z, then A is not semantically close to Z, but would be when mapped to the unit circle. We can insert a <b>mapping gap</b> - to ensure that a small change in A doesn't lead to Z. </div>
<div>
<br /></div>
<div>
If we have categories, rather than a continuum of input values, then it makes sense to divide the unit circle into sectors. For example, insect body lengths are a continuum, and so can map to the unit circle fairly easily. Insect names, however are words, not continuous real numbers, so we need to take hard firm slices of the circle, as illustrated further down.</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<b><span style="color: #990000;">Outputs</span></b></div>
<div>
In a similar way, we can map a node's output back to meaningful values, labels or classes.<br />
<br /></div>
<div>
If we had a continuum of real numbers, we can simply reverse the previous mapping back from the unit circle.<br />
<br />
If we have nominal categories, we slice up the circle into sectors, as shown in the diagram below. You can also see why a sector's bisector should be the target value for training. Easy!</div>
<div>
<br /></div>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgytTejss8vahXeSrU4ALKcWsce6XCrXH46HcZwsdAhNc8QCaHXIYLuvEMC_ETfh-ZQ1mDlYMKcu1WtrOXdC-JOFbmKUHrsktPb3R-90RPCzNi7N1rrZGwlGZf01Ess1dnT-ugx1VzvYXOd/s1600/map_outputs-3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgytTejss8vahXeSrU4ALKcWsce6XCrXH46HcZwsdAhNc8QCaHXIYLuvEMC_ETfh-ZQ1mDlYMKcu1WtrOXdC-JOFbmKUHrsktPb3R-90RPCzNi7N1rrZGwlGZf01Ess1dnT-ugx1VzvYXOd/s400/map_outputs-3.png" width="400" /></a></div>
<br />
<br />
<div>
<h3>
Aside: Phase is Important</h3>
Phase is incredibly important in signals from the real world. Let's illustrate how important.<br />
<br />
Take an image of something from the real world, and decompose the signal into its frequency phase and magnitude. You might recognise this as taking a <a href="https://en.wikipedia.org/wiki/Fourier_transform">Fourier transform</a>. If we reconstruct the image as follows:<br />
<br />
<ul>
<li>ignore the phase (set it all to 0), just use the magnitude</li>
<li>ignore the magnitude (set it all to 1), just use the phase</li>
</ul>
<br />
we find the the phase contains most of the information we use to understand an image - not the magnitude.<br />
<br />
Code to demonstrate this working is on <a href="https://github.com/makeyourownneuralnetwork/complex_valued_neuralnetwork/blob/master/image_phase_vs_magnitude.ipynb">github</a> - it shows clearly that there is much more recognisable information in the phase part of an image's Fourier transform.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeIw42Qcg9_bOY04XETl6jPBn6HCSTRVsTBC131IUPvkT875ZvYX-qUTHiZUCjgu0LeN140VsAdo9vUVmDEb0ShuS5JOOUFoJxmMJ-2PqzPqxRCYRzkWsUt1qlramFFd8oOzCAhMEJvFmu/s1600/phase_mag.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="253" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeIw42Qcg9_bOY04XETl6jPBn6HCSTRVsTBC131IUPvkT875ZvYX-qUTHiZUCjgu0LeN140VsAdo9vUVmDEb0ShuS5JOOUFoJxmMJ-2PqzPqxRCYRzkWsUt1qlramFFd8oOzCAhMEJvFmu/s400/phase_mag.png" width="400" /></a></div>
<br />
This exercise illustrates quite powerfully the importance of phase, and why a neural network based on processing and learning phase might be a good idea, particularly for image recognition tasks.<br />
<br />
<br />
<h3>
Learning Algorithm</h3>
We changed the activation function from an sigmoid shaped real function to a mapping of a signal to the unit circle. What does this mean for our learning algorithm?<br />
<br />
The first thing to realise is that the mapping $P(z) = \frac{z}{|z|} = e^{1j*arg(z)}$ is not differentiable in the complex domain <span style="color: #999999; font-size: x-small;">(doesn't satisfy the <a href="http://mathworld.wolfram.com/Cauchy-RiemannEquations.html">Cauchy-Riemann conditions</a>)</span>. Remember, we needed to differentiate the activation function so we could do gradient descent down the error function to find better and better weights.<br />
<br />
We don't need to do that here.<br />
<br />
That's right. We don't need to differentiate. Or do gradient descent.<br />
<br />
Have a look at the following digram which shows the actual output from a complex node and what the target output should be. Also illustrated is the correction that's needed - the error.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqjXOEmPrzsJCewx0wwGslsntX_9iztEmiDHfxrCn-f9VDLHkfAknERBqfSylVzZXfF3dpJa5dutd6aKMcYnq3cq3pMglykS3fXtfKHqP3NbKyWosnyhh35EF2P1MkgMwUnWJ0KGH-QvFN/s1600/learning.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="242" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqjXOEmPrzsJCewx0wwGslsntX_9iztEmiDHfxrCn-f9VDLHkfAknERBqfSylVzZXfF3dpJa5dutd6aKMcYnq3cq3pMglykS3fXtfKHqP3NbKyWosnyhh35EF2P1MkgMwUnWJ0KGH-QvFN/s400/learning.png" width="400" /></a></div>
<br />
Lets write out what's happening to the signal in symbols. From above, we already have:<br />
<br />
$$ z = \sum_{n}{w_n \cdot x_n} $$<br />
<br />
$$ P(z) = \frac{z}{|z|} = e^{1j * arg(z)} $$<br />
<br />
Now suppose we had the correct weight adjustments $\Delta w_n$ to make the $z$ point in the same direction as the desired training target $t$. Actually we can make it not just point in the same direction, we can imagine we've got the magnitude spot on too.<br />
<br />
$$ t = \sum_{n}{(w_n + \Delta w_n) \cdot x_n} $$<br />
<br />
The error, which is the required correction, $e = t - z$, is<br />
<br />
$$ e = \sum_{n}{(w_n + \Delta w_n) \cdot x_n} - \sum_{n}{(w_n) \cdot x_n} $$<br />
<br />
Simplifying,<br />
<br />
$$ e = \sum_{n}{\Delta w_n \cdot x_n} $$<br />
<br />
Ok - that's a nice simple relationship. It tells us the error $e$ is made up of many contributions from the various $\Delta w_n \cdot x_n$. But we don't have any more clues as to which values of $\Delta w_n$ precisely. Right now, it could be one of many different combinations ... just like $1+4=5$, $2+3=5$, $5+0=5$, and so on.<br />
<br />
We could break this deadlock by assuming that each of the $\Delta w_n \cdot x_n$ contributes equally to the error $e$. That is for each of the N contributing nodes, $\Delta w_n \cdot x_n = \frac{e}{N}$.<br />
<br />
Is this <b>too naughty</b>?! It might be, but we've done similar <b>cheeky</b> things before with traditional neural networks, where the back propagation of the error is done by dividing it up heuristically, not by some sophisticated analysis.<br />
<br />
We can now use this simpler expression and get $\Delta w_n$ on its own, which is what we want.<br />
<br />
$$ \Delta w_n \cdot x_n = \frac{e}{N} $$<br />
<br />
Normally we'd multiply both sides by $x^{-1}$. But remember, for complex numbers $x^{-1} = \frac{\bar{x}}{|x|}$. Remember also that the signals $x$ are on the unit circle so $|x|=1$, leaving $x^{-1}=\bar{x}$. That gives us,<br />
<br />
$$ \Delta w_n = \frac{e}{N} \cdot \bar{x} $$<br />
<br />
That's it! The learning rule ... no gradient descent, no differentiation ... just a simple <b>error correction rule</b>.<br />
<br />
Let's try it!<br />
<br />
<br />
<h3>
Experiment 1: Boolean OR, AND</h3>
The code for a single neuron, and a small complex valued neural network, learning the Boolean relations OR and AND are on <a href="https://github.com/makeyourownneuralnetwork/complex_valued_neuralnetwork/blob/master/network_boolean.ipynb">github</a>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5EmJ6g28SOCUdgzV2BgRVfGwEoupkCGYUjxPSt0Q_RZ6QSHr3upEL3i27ph7LF2e2IXF05uplpAjuUNF2tGWveBFLBsLmVHmNlEXddLDTIJuEFG5b9LUrd70A2J8injkw-oeyYVOVP6_V/s1600/boolean.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="288" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5EmJ6g28SOCUdgzV2BgRVfGwEoupkCGYUjxPSt0Q_RZ6QSHr3upEL3i27ph7LF2e2IXF05uplpAjuUNF2tGWveBFLBsLmVHmNlEXddLDTIJuEFG5b9LUrd70A2J8injkw-oeyYVOVP6_V/s400/boolean.png" width="400" /></a></div>
<br />
So that's encouraging .. it works. We've shown the incredibly simple error correction learning method works - no need for derivatives and gradient descent.<br />
<br />
But that test wasn't so challenging. Let's try a tougher test.<br />
<br />
<br />
<h3>
Periodic Activation Function</h3>
The complex valued neural node was easy to build and make, and experiments seem to confirm that it learns pretty quickly too.<br />
<br />
But a simple node still has problems with the traditionally challenging problems like learning XOR. With a traditional neural network this needs multiple nodes to solve.<br />
<br />
<a href="http://www.freewebs.com/igora/">Aizenberg</a> proposes that we can solve XOR with a single node, but to do that we need to make the activation function <b>periodic</b>. What does this mean? It means we enrich the process of mapping the unit circle to an output class. Instead of dividing the circle in sectors, one sector for each class, we have multiple sectors for each class. Each sector is then smaller. Also the sectors for the same class can't be next to each other - that would defeat the idea! A picture shows this best:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqLo1LoH4CF_HPlbof7_35iPVZzsWb2gz6cEV1VKUDbId8GBZjQag_YJ5RuPt9o2KK9XhBLLaxC68iJGoXYOHaN0unu8o2eISvFEXeki84um1u0dkkBOBghS2Uusubwh1FcEPun1Koi2yv/s1600/period_af-2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="261" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqLo1LoH4CF_HPlbof7_35iPVZzsWb2gz6cEV1VKUDbId8GBZjQag_YJ5RuPt9o2KK9XhBLLaxC68iJGoXYOHaN0unu8o2eISvFEXeki84um1u0dkkBOBghS2Uusubwh1FcEPun1Koi2yv/s400/period_af-2.png" width="400" /></a></div>
<br />
The learning process is the same as before, but this time we have a choice of target sectors for each actual output during training. Look at the diagram above - if the training target was "grass" which one of the two do we choose? Remember, the error depends on how far the actual output is from the training target, and here we seem to have two "grass" targets.<br />
<br />
What do we do? We pick the nearest one.<br />
<br />
Let's try it!<br />
<br />
<br />
<h3>
Experiment 2: Boolean XOR</h3>
The code for a single complex neuron with a periodic activation function is on <a href="https://github.com/makeyourownneuralnetwork/complex_valued_neuralnetwork/blob/master/single_neuron-periodic.ipynb">github</a>. We've used a periodicity of 2 for two classes (0, 1) .. which means a unit circle has four sectors corresponding to classes 0, 1, 0, 1.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRZZ6xodqnPkuS1sVz2xQzw6bltyi4n8bVw4I9Z-87vHiVLTCWdaN55i82rnPIOWZEBZU2Au5ANnS5epvBDzEwwbCemmNxpYRtTCnLItf7u4a-SuqY1gdwXCDeyO9Z90tIoTMG1Xv4rIky/s1600/xor.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRZZ6xodqnPkuS1sVz2xQzw6bltyi4n8bVw4I9Z-87vHiVLTCWdaN55i82rnPIOWZEBZU2Au5ANnS5epvBDzEwwbCemmNxpYRtTCnLItf7u4a-SuqY1gdwXCDeyO9Z90tIoTMG1Xv4rIky/s400/xor.png" width="400" /></a></div>
<br />
It works! So we have a <b>single complex node, learning XOR</b> .. something that wasn't possible with traditional neural nodes.<br />
<br />
Let's now try it on the famous Iris dataset, which is more of a challenge for most learning algorithms.<br />
<br />
<br />
<h3>
Experiment 3: Fisher's Iris Dataset</h3>
The <a href="https://en.wikipedia.org/wiki/Iris_flower_data_set">Iris data set</a> has a long history of challenging machine learning researchers. The data is simple enough - sepal and petal measurements of three species of Iris flower. The following scatterplot shows why it is a challenge - two of the three species have measurements which seem to intermingle into the same cluster.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgd2eJYrvRxilC0AHbG44qShNXZoKAKph_0U2ISd6-AjL-D0P6SID8Vlvw9Dw-XW_cmJIHa1iaeQnkz9PPdJyrtjMPJqjFBsDWzEwPmgVvncXbx5mbfuVfn0jIVF9jY6SnOhzqJTzqwB7kN/s1600/Iris_dataset_scatterplot.svg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgd2eJYrvRxilC0AHbG44qShNXZoKAKph_0U2ISd6-AjL-D0P6SID8Vlvw9Dw-XW_cmJIHa1iaeQnkz9PPdJyrtjMPJqjFBsDWzEwPmgVvncXbx5mbfuVfn0jIVF9jY6SnOhzqJTzqwB7kN/s400/Iris_dataset_scatterplot.svg.png" width="400" /></a></div>
<br />
Let's see how well a single complex neutron with a periodic activation function does.<br />
<br />
The code is a <a href="https://github.com/makeyourownneuralnetwork/complex_valued_neuralnetwork/blob/master/single_neuron-periodic-iris.ipynb">github</a>, and the following shows that with a periodicity of 3, we get <b>94.7%</b> performance against a randomly partitioned test dataset (25% of the data set).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizy7Sz-QIxwuAoCY51_1uAW8wp80e1Z0aR5E_2pjDEPmgwNi2ikB-cYLYnpMUQNM5aTwupVcxXxG4mBUzxYu7aSMKkZyK-HlJfQUERMWrHcc_y8Ukb5buH-VqjhsJknBiWcBAfV5huJgkW/s1600/Screen+Shot+2016-05-24+at+17.45.55.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="93" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizy7Sz-QIxwuAoCY51_1uAW8wp80e1Z0aR5E_2pjDEPmgwNi2ikB-cYLYnpMUQNM5aTwupVcxXxG4mBUzxYu7aSMKkZyK-HlJfQUERMWrHcc_y8Ukb5buH-VqjhsJknBiWcBAfV5huJgkW/s400/Screen+Shot+2016-05-24+at+17.45.55.png" width="400" /></a></div>
<br />
That is not bad at all! In fact, that's <b>really rather impressive!</b><br />
<br />
Microsoft's more complex <a href="http://gallery.cortanaintelligence.com/Experiment/Clustering-Group-iris-data-2">example</a> achieves 97% .. so we're not doing badly at all with very simple ideas and code .. and remember, only a single neuron!<br />
<br />
Again ... to emphasise this ... an academic paper [<a href="http://lab.fs.uni-lj.si/lasin/wp/IMIT_files/neural/doc/seminar8.pdf">PDF</a>, table 2] shows similar scores of 95-97% using more complex networks.<br />
<br />
How do we choose the right periodicity? I don't know. At this stage it is trial and error to choose the right periodicity.<br />
<br />
<br />
<h3>
Conclusion</h3>
We've achieved:<br />
<br />
<ul>
<li>a very simple approach to neural networks that reflects the importance of phase</li>
<li>a super simple learning algorithm that avoids differentiation and gradient descent</li>
<li>a powerful approach to learn non-threshold functions like XOR</li>
<li>performance amongst the best from just a single complex neuron</li>
</ul>
<br />
<br />
Next time we'll see how these complex neurons can be combined into networks, and also see if we get good results from the MNIST handwriting challenge.<br />
<br />
<hr />
<br />
<i>The ideas here are inspired by Igor Aizenberg, see more at his <a href="http://www.eagle.tamut.edu/faculty/igor/CVNN-MVN.htm">university course page</a>.</i><br />
<i><br /></i></div>
</div>
MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-81952889956775770752016-05-09T07:59:00.000-07:002016-05-09T08:01:48.148-07:00Great Question from France: Training OrderI had a great question from Hamid from France, which led onto more interesting thoughts.<br />
<br />
He assumed each training example was fed forward through the network many times, each time reducing the error, and wanted to know when to stop and move onto the next training example. That is:<br />
<br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 1: FW, BP, FW, BP, FW, BP, ....</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 2: FW, BP, FW, BP, FW, BP, ....</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 3: FW, BP, FW, BP, FW, BP, ....</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 4: FW, BP, FW, BP, FW, BP, ....</span><br />
<br />
<i>( FW=Feed Forward, BP=Back Propagate )</i><br />
<br />
<div style="text-align: center;">
---</div>
<br />
My immediate reply was to say that this wasn't how it was normally done, but instead each training example was used in turn. Some call this <b>on-line learning</b>. That is:<br />
<br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 1: FW, BP</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 2: FW, BP</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 3: FW, BP</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 4: FW, BP</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">...</span><br />
<br />
And I said that it is often a good idea to repeat this several times, that is, training for several <b>epochs</b>.<br />
<br />
<div style="text-align: center;">
---</div>
<br />
Some will in fact <b>batch </b>together a few training examples and sum up the error to be used for back propagation. That is:<br />
<br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 1: FW (accumulate error)</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 2: FW</span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;"> </span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">(accumulate error)</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 3: FW</span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;"> </span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">(accumulate error)</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 4</span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">: </span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">FW, BP accumulated error</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 5: FW</span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;"> </span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">(accumulate error)</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 6: FW</span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;"> </span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">(accumulate error)</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 7: FW</span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;"> </span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">(accumulate error)</span><br />
<span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">Training Example 8</span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">: </span><span style="background-color: #fff2cc; color: #073763; font-family: "courier new" , "courier" , monospace;">FW, BP accumulated error</span><br />
<br />
<div style="text-align: center;">
---</div>
<br />
Then I thought about it more and concluded that Hamid's approach wasn't wrong at all - just different. He was asking what the stopping criteria would be for applying the same training data example many times. The real answer is .. I don't know - but I would experiment to find out.<br />
<br />
<div style="text-align: center;">
---</div>
<br />
Hamid's question is a good one, because it is not often made very clear which order or scheme is used. It is too easy for authors and teachers to assume new readers will know which scheme is being considered, or even which ones are a good idea.<br />
<br />
<b>That's why I love feedback from readers - they ask the best questions!</b><br />
<br />
Thanks Hamid!MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-52743325791069293022016-05-04T16:02:00.004-07:002016-05-04T16:02:48.028-07:00Slides for PyData London and EuroPython Bilbao 2016I've been lucky enough to be chosen to talk at <a href="http://pydata.org/london2016/">PyData London</a> and <a href="https://ep2016.europython.eu/en/">EuroPython Bilbao 2016</a>.<br />
<br />
These are the slides - they're still in development and could change at anytime - but I thought I'd share in case someone found them useful.<br />
<br />
<h3 style="text-align: center;">
<b><a href="https://goo.gl/JKsb62">https://goo.gl/JKsb62</a></b></h3>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://goo.gl/JKsb62" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYD41pQ2qP2ttsIFDjSOtoNBhspsu7tEE4At1vYNRcjL-IzxkG2nv3FdLpJIZ5ViJRe8p5vVa_rT71dc2UgL_LvIU12k7fScFvYg3vx2_ECMcnODkOzrBn_ztVWYYGzwP8Tq2BsCYS8POd/s400/A+Gentle+Introduction+to+Neural+Networks+%2528with+Python%2529.png" width="400" /></a></div>
<br />
I'll also be running a gentle intro session at the <a href="http://www.meetup.com/The-London-Python-Group-TLPG/events/230326173/">London Python Meetup Group</a> in May too.MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-34198193172314735862016-04-18T15:19:00.002-07:002016-04-18T15:19:58.437-07:00Republished as Kindle TextbookAfter much deep thought I pulled my ebooks and republished them as <a href="https://kdp.amazon.com/edu">Kindle Textbooks</a>.<br />
<br />
Why?<br />
<br />
Well - the ebook format(s) are a pain.<br />
<br />
<ul>
<li>They're <b>not</b> sufficiently <b>standardised</b> for all parties to agree on.</li>
<li>The quality of <b>tools</b> to create them is rubbish.</li>
<li>And <b>interoperability</b> is a pain.</li>
</ul>
<br />
It's like web in the early days, different interests trying to subvert html, trying to "embrace, extent, extinguish". It took almost 20 years for the web to settle down around well understood common and open interoperable standards.<br />
<br />
The epub format is open but still in flux. Amazon doesn't actually support it properly or directly - with lack of support in places, inconsistent support elsewhere, and additional proprietary features too. They have their own mobi, kf8 and azw formats too. And now kpf too. Their preview tools all work differently too - showing different results.<br />
<br />
This is depressing - especially as it should be possible for all Kindles, old and new, to show text, follow basic links, and show images. Except they can't. Some do, some don't.<br />
<br />
<b>I felt bad about some users not being able to read the book properly .. so I had to act</b>.<br />
<br />
The new <a href="https://www.amazon.com/gp/feature.html?docId=1002998671">Kindle Textbook Creator</a> works more like PDF - a format that preserves layout but at the cost of not being able to reflow text.<br />
<br />
I made the decision that it was more important for the book to <b>always</b> work for readers - even if that meant fewer readers could buy the digital book. Older Kindles can't buy the book now it is in the new format. I am sad about that.<br />
<br />
One day, digital book publishing will be fixed, or as fixed, as the web is today.MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-87801653431130311592016-04-18T14:55:00.000-07:002016-04-18T14:55:44.932-07:00Error #2<b>Jon S</b> pointed out that Deep Blue beat Gary Kasparov in 1997, not 1997 as stated in the introduction.<br />
<br />
This will be fixed in the next updated content.<br />
<br />MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-34684024459016790142016-04-15T15:35:00.003-07:002016-04-16T13:58:46.771-07:00Error #1<b>Michael B</b> found an error on page 32 of the book. That's the section where the idea of moderating the learning is introduced - a learning rate. The second training example uses a target value of 0.9. That's wrong, it should have said 1.9. The calculations which then update the slope A are wrong.<br />
<br />
Below is the updated section, and the diagram has also been updated too.<br />
<br />
<hr />
<span style="color: #999999;"><br /></span>
<br />
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="color: #999999;"><b id="docs-internal-guid-030496a3-1bfd-ec23-6791-148d882799be" style="font-weight: normal;"><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Let’s press on to the second training data example at </span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">x</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> = 1.0. Using </span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">A</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> = 0.3083 we have y = 0.3083 * 1.0 = 0.3083. The desired value was <span style="background-color: yellow;">2.9</span> so the error is (<span style="background-color: yellow;">2.9 -</span> 0.3083) = <span style="background-color: yellow;">2.5917</span>. The Δ</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">A</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> = </span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">L</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> (</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">E</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> / </span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">x</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">) = 0.5 * <span style="background-color: yellow;">2.5917</span> / 1.0 = <span style="background-color: yellow;">1.2958.</span> The even newer </span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">A</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> is now 0.3083 + <span style="background-color: yellow;">1.2958</span> = <span style="background-color: yellow;">1.6042</span>.</span></b></span></div>
<span style="color: #999999;"><b id="docs-internal-guid-030496a3-1bfd-ec23-6791-148d882799be" style="font-weight: normal;"><br /></b></span>
<br />
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="color: #999999;"><b id="docs-internal-guid-030496a3-1bfd-ec23-6791-148d882799be" style="font-weight: normal;"><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Let’s visualise again the initial, improved and final line to see if moderating updates leads to a better dividing line between ladybird and caterpillar regions.</span></b></span></div>
<span style="color: #999999;"><b id="docs-internal-guid-030496a3-1bfd-ec23-6791-148d882799be" style="font-weight: normal;">
</b></span>
<br />
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center;">
<span style="color: #999999;"><b id="docs-internal-guid-030496a3-1bfd-ec23-6791-148d882799be" style="font-weight: normal;"><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><img alt="part1_classifier_refinements_moderated.png" height="317" src="https://lh3.googleusercontent.com/TBcT3vlEmVVLuVVWmBnJex9lgLrC-rBlumagHMxEKcGJi09Jh05JC3oMehX7NiNqW2ARSpsWWLzABAps_MyJI9jbNX5_TaJ7Ruath6rR7580YOYuHU5qMziDj4kyUsmDyq9MURGu" style="border: medium none; transform: rotate(0rad);" width="400" /></span></b></span></div>
<span style="color: #999999;"><b id="docs-internal-guid-030496a3-1bfd-ec23-6791-148d882799be" style="font-weight: normal;">
</b></span>
<br />
<div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;">
<span style="color: #999999;"><b id="docs-internal-guid-030496a3-1bfd-ec23-6791-148d882799be" style="font-weight: normal;"><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">This is really good!</span></b></span></div>
<span style="color: #999999;"><b id="docs-internal-guid-030496a3-1bfd-ec23-6791-148d882799be" style="font-weight: normal;">
<br /><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Even with these two simple training examples, and a relatively simple update method using a moderating </span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">learning rate</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">, we have very rapidly arrived at a good dividing line </span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">y</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> = </span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Ax</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> where </span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">A</span><span style="background-color: transparent; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> is <span style="background-color: yellow;">1.6042</span>.</span></b></span><br />
<br />
<hr />
<br />
The ebook has been updated and you should get an automatic update, or ask Amazon to trigger an update if it is slow to get to you. The print book has also been updated.<b id="docs-internal-guid-030496a3-1bfd-ec23-6791-148d882799be" style="font-weight: normal;"><span style="background-color: transparent; color: black; font-family: "arial"; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"><br /></span></b>MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-83277737326262483772016-04-14T15:10:00.001-07:002016-04-14T15:11:03.653-07:00Busting Past 98% AccuracyI was working on a document and expected to take a few hours ... so I thought, why not try a longer neural network training session to see if I could break past 98% performance.<br />
<br />
<br />
The neural network architecture and training was boosted to:<br />
<ul>
<li>300 hidden layer nodes</li>
<li>30 training epochs</li>
<li>rotate training images +/- 10 degrees. </li>
</ul>
That took about <b>3 hours </b>on my laptop!<br />
<br />
The resultant performance did indeed break the previous record .. at <b>98.03%</b><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqhGOE6B0K4Qu7qPRo9a0T5lqoS7ViEvR_F1z4bk4ZvumzJpdYZf7Q1k-6_6oHlT_xA7oyE0XzN1lBBONTOUg9Y3CwB9bwKCoTli3K_XYCGfd8oMSm7rLBGcVPFet-wRH8e5Dit1kCtoBQ/s1600/Screen+Shot+2016-04-14+at+23.00.10.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="92" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqhGOE6B0K4Qu7qPRo9a0T5lqoS7ViEvR_F1z4bk4ZvumzJpdYZf7Q1k-6_6oHlT_xA7oyE0XzN1lBBONTOUg9Y3CwB9bwKCoTli3K_XYCGfd8oMSm7rLBGcVPFet-wRH8e5Dit1kCtoBQ/s400/Screen+Shot+2016-04-14+at+23.00.10.png" width="400" /></a></div>
<b> </b> MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.comtag:blogger.com,1999:blog-636493922605943518.post-37109588918728863912016-03-31T07:22:00.002-07:002016-04-18T15:00:29.807-07:00The Book is Out!Finally, the book is out!<br />
<br />
<a href="http://www.amazon.co.uk/Make-Your-Neural-Network-ebook/dp/B01EER4Z4G">Make Your Own Neural Network</a> - a gentle introduction to the mathematics of neural networks, and making your own with Python.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://www.amazon.co.uk/Make-Your-Own-Neural-Network-ebook/dp/B01DLHCW72/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhLUUvhJVCsMfieiZ5w9VelRMVVLWN23L2PPkDPpaxaUvf-i_Sm5cgo7YqY13b3KHsN1c92Gv8oD6UJKuK_ZjM70qw9Sdn-NuvPEEc8SxDkP7G2TSf-8MsAFBicUqzQ1AU2zhxESdU8rUk5/s400/tmp_26732-neural_network_cover1444163222.png" width="266" /></a></div>
<br />
You can get it on Amazon Kindle, and a <a href="http://www.amazon.co.uk/Make-Your-Own-Neural-Network/dp/1530826608">paper print</a> version is also be available. MYO NeuralNethttp://www.blogger.com/profile/03594841041972630279noreply@blogger.com