Introducing Ceph-O-Vision

I’ve been interested in cephalopod vision ever since I learned that, despite their superb appreciation for chroma (as evidenced by their ability to match the color of their surroundings as well as texture and pattern), cuttlefish eyes contain only one light-sensitive pigment. Unlike ourselves and other multichromatic animals that perceive color as a mix of activations of different-colored light receptors, cuttlefish must have another way. So while the images coming into the brain of a cuttlefish might look something like this . . .

dfs

. . . they manage to interpret the images to precisely match their surroundings and communicate colorful displays to other cuttlefish. Some time ago Stubbs and Stubbs put forth the possibility that they might use chromatic aberrations to interpret color (I discussed and simulated what that might look like in this post). What looks like random flickering in the gif above is actually simulated focusing across chromatic aberrations. [original video]. Contrary to what one might think, defocus and aberration in images isn’t “wrong.” On the contrary, if you know how to interpret them they provide a wealth of information that might allow a cuttlefish to see the world in all its chromatic glory.

Top: learned color image based on chromatic aberration stack. Middle: Neural network color reconstitution Bottom: Ground truth color image

We shouldn’t expect the cuttlefish to experience their world in fuzzy grayscale any more than we should expect humans to perceive their world in an animal version of a Bayer array, each photoreceptor individually distinguished (not to mention distracting saccades, blind spot at the optic nerve, vasculature shadowing, etc.). Instead, just like us humans, they would learn to perceive the visual data produced by their optical system in whatever way makes the most sense and is most useful.

I piped simulated cuttlefish vision images into a convolutional neural network with corresponding color images as reference. The cuttle-vision images flow through the 7 layer network and are compared to the RGB targets on the other side. I started by building a dataset of simulated images consisting of randomly placed pixel-sized colored dots. This was supposed to be the easy “toy example” I started with before moving on to real images.


Left: training input, middle: network’s attempt at reconstitution, right: target. For pixel sized color features, the convolutional kernels of the network learn to blur the target pixels into ring shapes.

Bizarrely, the network learned to interpret these images as colored donuts, usually centered around the correct location but incapable of reconstituting the original layout. Contrary to what you might expect, the simple dataset performed poorly even with many training examples and color image reconstitution improved dramatically when I switched to real images. Training on a selection of landscape images looks something like this:


Center: Ceph-O-Vision color perception. Bottom: Ground truth RGB. Top: Chromatic aberration training images (stacked as a color image for viewing)

As we saw in first example, reconstituting sparse single pixels from chromatic aberration images trains very poorly. However, the network was able to learn random patterns of larger features (offering better local context) much more effectively:

Interestingly enough, the network learns to be most sensitive to edges. You can see in the training gif above that after 1024 epochs of training, the network mainly reconstitutes pattern edges. It never learns to exactly replicate the RGB pattern, but gets pretty close. It would be interesting to use a network like this to predict what sort of optical illusions a cuttlefish might be susceptible too. This could provide a way to test the chromatic aberration hypothesis in cephalopod vision. Wikipedia Imageby Hans Hillewaert used as a mask for randomly generated color patterns.

Finally, I trained the network on some footage of a hunting cuttlefish CC BY SA John Turnbull. Training on the full video, here’s what a single frame looks like as the network updates over about a thousand training epochs:

This project is far from a finished piece, but it’s already vastly improved my intuition for how convolutional neural networks interpret images. It also provides an interesting starting point for thinking about how cuttlefish visually communicate and perceive. If you want more of the technical and unpolished details, you can follow this project’s Github repository. I have a lot of ideas on what to try next: naturally some control training with a round pupil (and thus less chromatic aberration), but also to compare the simple network I’ve built so far to the neuroanatomy of cephalopods and to implement a “smart camera” version for learning in real-time. If you found this project interesting, or have your own cool ideas mixing CNNs and animal vision, be sure to let me know @theScinder or in the comments.

I too built a rather decent deep learning rig for 900 quid

Skip to the components list
Skip to the benchmarks

Robert Heinlein’s 1957 Door into Summer returns throughout to a theme of knowing when it’s “time to railroad.” Loosely speaking this is the idea that one’s success comes as much from historical context as it does from innate ability, hard work, and luck (though much of the latter can be attributed to historical context).

Much of the concepts driving our modern AI renaissance are decades old at least- but the field lost steam as the computers were too slow and the Amazookles of the world were yet to use them to power their recommendation engines and so on. In the meantime computers have gotten much faster and much better at beating humans at very fancy games. Modern computers are now fast enough to make deep learning feasible, and it works for many problems as well as providing insight into how our own minds might work.

I too have seen the writing on the wall in recent years. I can say with some confidence that now is the time to railroad, and by “railroad” I mean revolutionise the world with artificial intelligence. A lot of things changed in big ways during the original “time to railroad,” the industrial revolution. For some this meant fortune and progress and for others, ruin. I would like to think that we are all a bit brighter than our old-timey counterparts were back then and we have the benefit of our history to learn from, so I’m rooting for an egalitarian utopia rather than an AI apocalypse. In any case, collective stewardship of the sea changes underway is important and this means the more people learn about AI the less likely the future will be decided solely by the technocratic elites of today.

I’ve completed a few MOOCs on machine learning in general and neural networks in particular, coded up some of the basic functions from scratch and I’m beginning to use some of the major libraries to investigate more interesting ideas. As I moved on from toy examples like MNIST and housing price prediction one thing became increasingly clear:

It took me a week of work to realize I was totally on the wrong track training a vision model meant to mimic cuttlefish perception on my laptop. This sort of wasted time really adds up, so I decided to go deeper and build my own GPU-enhanced deep learning rig.

Luckily there are lots of great guides out there as everyone and their grandmother is building their own DL rig at the moment. Most of the build guides have something along the lines of “. . . for xxxx monies” in the title, which makes it easier to match budgets. Build budgets run the gamut from the surprisingly capable $800 machine by Nick Condo to the serious $1700 machine by Slav Ivanov all the way up to the low low price of “under $5000” by Kunal Jain. I did not even read the last one because I am not made of money.

I am currently living in the UK, so that means I have to buy everything in pounds. The prices for components in pounds sterling are. . . pretty much the same as they are in greenbacks. The exchange rate to the British pound can be a bit misleading, even now that Brexit has crushed the pound sterling as well as our hopes and dreams. In my experience it seems like you can buy about the same for a pound at the store as for a dollar in the US or a euro on the continent. It seems like the only thing they use the exchange rate for is calculating salaries.

I’d recommend first visiting Tim Dettmers’ guide to choosing the right GPU for you. I’m in a stage of life where buying the “second cheapest” appropriate option is usually best. With a little additional background reading and following Tim’s guide, I selected the Nvidia GTX 1060 GPU with 6GB of memory. This was from Tim’s “I have little money” category, one up from the “I have almost no money” category, and in keeping with my life philosophy of the second-cheapest. Going to the next tier up is often close to twice as costly, but not close to twice as good. This holds for my choice of GPUs as well: a single 1070 is about twice the cost and around 50% or so faster than a 1060 However, two 1060s does get you pretty close to twice the performance, and that’s what I went with. As we’ll see in the benchmarks Tensorflow does make it pretty easy to take advantage of both GPUs, but doubling the capacity of my DLR by doubling the GPUs in the future won’t be plausible.

My upgradeability is somewhat limited by the number of threads (4) and PCIe lanes (16) of the modest i3 CPU I chose; if a near-term upgrade was a higher priority, I should have left out the second 1060 GPU and diverted that part of a budget to a better CPU (e.g. the Intel Xeon E5-1620 V4 recommended by Slav Ivanov). But if you’re shelling out so much for a higher-end system you’ll probably want a bigger GPU to start with, and it’s easy to see how one can go from a budget of $800 to $1700 rather quickly.

The rest of the computer’s job is to quickly dump data into the GPU memory without messing things up. I ended up using almost all the same components as those in Nick’s guide because, again, my physical makeup is meat rather than monetary in nature.

Here’s the full list of components. I sourced what I could from Amazon Warehouse Deals to try and keep the cost down.


GPU (x2): Gigabyte Nvidia GTX 1060 6GB (£205.78 each)
Motherboard: MSI Intel Z170 KRAIT-GAMING (£99.95)
CPU: Intel Core i3 6100 Skylake Dual-Core 3.7 GHz Processor (£94.58)
Memory: Corsair CMK16GX4M2A2400C14 Vengeance 2x8GB (1£05.78)
PSU: Corsair CP-9020078-UK Builder Series 750W CS750M ATX/EPS Semi-Modular 80 Plus Gold Power Supply Unit (£77.25)
Storage: SanDisk Ultra II SSD 240 GB SATA III (£72.18)
Case: Thermaltake Versa H23 (27.10)

Total: £888.40

I had never built a PC before and didn’t have any idea what I was doing. Luckily, Youtube did, and I didn’t even break anything when I slotted all the pieces together. I had an install thumb drive for Ubuntu 16.04 hanging around ready to go and consequently I was up and running relatively quickly.

The next step was installing the drivers and CUDA developer’s toolkit for the GPUs. I’ve been working mainly with Tensorflow lately, so I followed their guide to get everything ready to take advantage of the new setup. I am using Anaconda to manage Python environments for now, so I made one with tensorflow and another with tensorflow_gpu packages.

I decided to train on the CIFAR10 image classification dataset using this tutorial to test out the GPUs. I also wanted to see how fast training progresses on a project of mine, a two-category classifier for quantitative phase microscope images.

The CIFAR10 image classification tutorial from tensorflow.org was perfect because you can flag for the training to take place on one or two GPUs, or train on the CPU alone. It takes ~1.25 hours to train the first 10000 steps on the CPU, but only 4 minutes for the same training on one 1060. That’s a weeks-to-days/days-to-hours/hours-to-minutes level of speedup.

# CPU 10000 steps
2017-06-18 12:56:38.151978: step 0, loss = 4.68 (274.9 examples/sec; 0.466 sec/batch)
2017-06-18 12:56:42.815268: step 10, loss = 4.60 (274.5 examples/sec; 0.466 sec/batch)

2017-06-18 14:12:50.121319: step 9980, loss = 0.80 (283.0 examples/sec; 0.452 sec/batch)
2017-06-18 14:12:54.652866: step 9990, loss = 1.03 (282.5 examples/sec; 0.453 sec/batch)

# One GPU
2017-06-18 15:50:16.810051: step 0, loss = 4.67 (2.3 examples/sec; 56.496 sec/batch)
2017-06-18 15:50:17.678610: step 10, loss = 4.62 (6139.0 examples/sec; 0.021 sec/batch)
2017-06-18 15:50:17.886419: step 20, loss = 4.54 (6197.2 examples/sec; 0.021 sec/batch)

2017-06-18 15:54:00.386815: step 10000, loss = 0.96 (5823.0 examples/sec; 0.022 sec/batch)

# Two GPUs
2017-06-25 14:48:43.918359: step 0, loss = 4.68 (4.7 examples/sec; 27.362 sec/batch)
2017-06-25 14:48:45.058762: step 10, loss = 4.61 (10065.4 examples/sec; 0.013 sec/batch)

2017-06-25 14:52:28.510590: step 6000, loss = 0.91 (8172.5 examples/sec; 0.016 sec/batch)

2017-06-25 14:54:56.087587: step 9990, loss = 0.90 (6167.8 examples/sec; 0.021 sec/batch)

That’s about 21-32x speedup on the GPUs. Not quite double the speed on two GPUs because the model isn’t big enough to utilize all of both GPUs, as we can see in the output from nvidia-smi

# Training on one GPU

# Training on two GPUs

My own model had a similar speedup, going from training about one 79-image minibatch per second to training more than 30 per second. Trying to train this model on my laptop, a Microsoft Surface Book, I was getting about 0.75 steps a second. [Aside: the laptop does have a discrete GPU, a variant of the GeForce 940M, but no Linux driver that I’m aware of :/].

# Training on CPU only
INFO:tensorflow:global_step/sec: 0.981465
INFO:tensorflow:loss = 0.673449, step = 173 (101.889 sec)
INFO:tensorflow:global_step/sec: 0.994314
INFO:tensorflow:loss = 0.64968, step = 273 (100.572 sec)

# Dual GPUs
INFO:tensorflow:global_step/sec: 30.3432
INFO:tensorflow:loss = 0.317435, step = 90801 (3.296 sec)
INFO:tensorflow:global_step/sec: 30.6238
INFO:tensorflow:loss = 0.272398, step = 90901 (3.265 sec)
INFO:tensorflow:global_step/sec: 30.5632
INFO:tensorflow:loss = 0.327474, step = 91001 (3.272 sec)
INFO:tensorflow:global_step/sec: 30.5643
INFO:tensorflow:loss = 0.43074, step = 91101 (3.272 sec)
INFO:tensorflow:global_step/sec: 30.6085

Overall I’m pretty satisfied with the setup, and I’ve got a lot of cool projects to try out on it. Getting the basics for machine learning is pretty easy with all the great MOOCs and tutorials out there, but the learning curve slopes sharply upward after that. Working directly on real projects with a machine that can train big models before the heat-death of the universe is essential for gaining intuition and tackling cool problems.

Trolling a Neural Network to Learn About Color Cues

Neural networks are breaking into new fields and refining roles in old ones on a day-to-day basis. The main enabling breakthrough in recent years is the ability to efficiently train networks consisting of many stacked layers of artificial neurons. These deep learning networks have been used for everything from tomographic phase microscopy to learning to generate speech from scratch.

A particularly fun example of a deep neural net comes in the form of one @ColorizeBot, a twitter bot that generates color images from black and white photographs. For landscapes, portraits, and street photography the results are reasonably realistic, even if they do fall into an uncanny valley that is eery, striking, and often quite beautiful. I decided to try and trick @ColorizeBot to learn something about how it was trained and regularized, and maybe gain some insights into general color cues. First, a little background on how @ColorizeBot might be put together.

According to the description on @ColorizeBot’s Twitter page:

I hallucinate colors into any monochrome image. I consist of several ConvNets and have been trained on millions of images to recognize things and color them.

This tells us that CB is indeed an artificial neural network with many layers, some of which consist of convolutional layers. These would be sharing weights and give deep learning the ability to discover features from images rather than relying on a conventional machine vision approach of manual extraction of image features to train an algorithm. This gives CB the ability to discover important indicators of color that their handler wouldn’t necessarily have thought of in the first place. I expect CB was trained as a special type of autoencoder. Normally, an autoencoding neural network has the same data on both the input and output side and iteratively tries to reproduce the input at the output in an efficient manner. In this case instead of producing a single grayscale image at the output, the network would need to produce three versions, one image each for red, green, and blue color channels. Of course, it doesn’t make sense to totally throw away the structure of the black and white image and the way the authors include this a priori knowledge to inform the output must have been important for getting the technique to work well and fast. CB’s twitter bio claims to have trained on millions of photos, and I tried to trick it into making mistakes and revealing something about it’s inner workings and training data. To do this, I took some photos I thought might yield interesting results, converted them to grayscale, and sent them to @ColorizeBot.

The first thing I wanted to try is a classic teaching example from black and white photography. If you have ever thought about dusting off a vintage medium format rangefinder and turning your closet into a darkroom, you probably know that a vibrant sun-kissed tomato on a bed of crisp greens looks decidedly bland on black and white film. If one wishes to pursue the glamorous life of a hipster salad photograher, it’s important to invest in a few color filters to distinguish red and green. In general, red tomatoes and green salad leaves have approximately the same luminance (i.e. brightness) values. I wrote about how this example might look through the unique eyes of cephalapods, which can perceive color with only one color type of photoreceptor. Our own visual system can only see contrast between the two types of object by their color, but if a human viewer looks at a salad in a dark room (what? midnight is a perfectly reasonable time for salad), they can still tell what is and is not a tomato without distinguishing the colors. @ColorizeBot interprets a B&W photo of cherry tomatoes on spinach leaves as follows:

c2sel44vqaagemw-jpg-large

This scene is vaguely plausible. After all, it some people may prefer salads with unripe tomatoes. Perhaps meal-time photos from these people’s social media feeds made it into the training data for @ColorizeBot. What is potentially more interesting is that this test image revealed a spatial dependence- the tomatoes in the corner were correctly filled in with a reddish hue, while those in the center remain green. Maybe this has something to do with how salad images used to train the bot were framed. Alternatively, it could be that the abundance of leaves surrounding the central tomatoes provide a confusing context and CB is used to recognizing more isolated round objects as tomatoes. In any case it does know enough to guess than spinach is green and some cherry tomatoes are reddish.

Next I decided to try and deliberately evoke evidence of overfitting with an Ishihara test. These are the mosaic images of dots with colored numbers written in the pattern. If @ColorizeBot scraped public images from the internet for some of its training images, it probably came across Ishihara tests. If the colorizer expects to see some sort of numbers (or any patterned color variation) in a circle of dots that looks like a color-blindness test, it’s probably overfitting; the black and white image by design doesn’t give any clues about color variation.

c2se-teveae2_ay-jpg-large

That one’s a pass. The bot filled in the flyer with a bland brown coloration, but didn’t overfit by dreaming up color variation in the Ishihara test. This tells us that even though there’s a fair chance the neural net may have seen an imagef like this before, it doesn’t expect it every time it sees a flat pattern of circles. CB has also learned to hedge its bets when looking at a box of of colored pencils, which could conceivably be a box of brown sketching pencils.

c2seviwviaa87xo-jpg-large

What about a more typical type of photograph? Here’s an old truck in some snow:

c2scawfveaallw4-jpg-large

CB managed to correctly interpret the high-albedo snow as white (except where it was confused by shadows), and, although it made the day out to be a bit sunnier than it actually was, most of the winter grass was correctly interpreted as brown. But have a look on the right hand side of the photo, where apparently CB decided the seasons changed to a green spring in the time it takes to scan your eyes across the image. This is the sort of surreal, uncanny effect that CB is capable of. It’s more pronounced, and sometimes much more aesthetic, on some of the fancier photos on CB’s Twitter feed. The seasonal transformation from one side of the photo tells us something about the limits of CB’s interpretation of context.

In a convolutional neural network, each part of an input image is convolved with kernels of a limited size, and the influence of one part of the image on its neighbors is limited to some degree by the size of the largest kernels. You can think of these convolutional kernels as smaller sub-images that are applied to the full image as a moving filter, and they are a foundational component of the ability of deep neural networks to discover features, like edges and orientations, without being explicitly told what to look for. The results of these convolutional layers propagate deeper through the network, where the algorithm can make increasingly complex connections between aspects of the image.

In the snowy truck and in the tomato/spinach salad examples, we were able to observe @ColorizeBot’s ability to change it’s interpretation of the same sort of objects across a single field of view. If you, fellow human, or myself see an image that looks like it was taken in winter, we include in our expectations “This photo looks like it was taken in winter, so it is likely the whole scene takes place in winter because that’s how photographs and time tends to work.” Likewise, we might find it strange for someone to have a preference for unripe tomatoes, but we’d find it even stranger for someone to prefer a mixture of ripe-ish and unripe tomatoes on the same salad. Maybe the salad maker was an impatient type suffering from a tomato shortage, but given a black and white photo that wouldn’t be my first guess on how it came to be based on the way most of the salads I’ve seen throughout my life have been constructed. In general we don’t see deep neural networks like @Colorizebot generalizing that far quite yet, and the resulting sense of context can be limited. This is different from generative networks like Google’s “Inception” or style transfer systems like Deepart.io, which perfuse an entire scene with a cohesive theme (even if that theme is “everything is made of duck’s eyes”).

Finally, what does CB think of theScinder’s logo image? It’s a miniature magnetoplasmadynamic thruster built out of a camera flash and magnet wire. Does CB have any prior experience with esoteric desktop plasma generators?

c29xshxviaa2_g3

That’ll do CB, that’ll do.

Can’t get enough machine learning? Check out my other essays on the topic

@ColorizeBot’s Twitter feed

@CtheScinder’s Twitter feed

All the photographs used in this essay were taken by yours truly, (at http://www.thescinder.com), and all images were colorized by @ColorizeBot.

And finally, here’s the color-to-B&W-to-color transformation for the tomato spinach photo:

tomatotrickery

Teaching a Machine to Love  XOR

xorsketch

The XOR function outputs true if one of the two inputs are true

The exclusive or function, also known as XOR (but never going by both names simultaneously), has a special relationship to artificial intelligence in general, and neural networks in particular. This is thanks to a prominent book from 1969 by Marvin Minsky and Seymour Papert entitled “Perceptrons: an Introduction to Computational Geometry.” Depending on who you ask, this text was single-handedly responsible for the AI winter due to its critiques of the state of the art neural network of the time. In an alternative view, few people ever actually read the book but everyone heard about it, and the tendency was to generalize a special-case limitation of local and single-layer perceptrons to the point where interest and funding for neural networks evaporated. In any case, thanks to back-propagation, neural networks are now in widespread use and we can easily train a three-layer neural network to replicate the XOR function.

In words, the XOR function is true for two inputs if one of them, but not both, is true. When you plot the XOR as a graph, it becomes obvious why the early perceptron would have trouble getting it right more than half the time.

sketch2dxor

There’s not a way to draw a straight 2D line on the graph and separate the true and false outputs for XOR, red and green in the sketch above. Go ahead and try. The same is going to be true trying to use a plane to separate a 3D version and so on to higher dimensions.

sketch3dxor

That’s a problem because a single layer perceptron can only classify points linearly. But if we allow ourselves a curved boundary, we can separate the true and false outputs easily, which is exactly what we get by adding a hidden layer to a neural network.

xorwhiddenlayer

The truth-table for XOR is as follows:

Input Output
00 0
01 1
10 1
11 0

If we want to train a neural network to replicate the table above, we use backpropagation to flow the output errors backward through the network based on the neuron activations of each node. Based on the gradient of these activations and the error in the layer immediately above, the network error can be optimized by something like gradient descent. As a result, our network can now be taught to represent a non-linear function. For a network with two inputs, three hidden units, and one output the training might go something like this:

trainingxor

Update (2017/03/02) Here’s the gist for making the gif above:

# Training a neural XOR circuit
# In Marvin Minsky Seymour Papert's in/famous critique of perceptrons () published in 1969, they argued that neural networks
# had extremely limited utility, proving that the perceptrons of the time could not even learn
# the exclusive OR function. This played some role
# Now we can easily teach a neural network an XOR function by incorporating more layers.
# Truth table:
# Input | Output
# 00 | 0
# 01 | 1
# 10 | 1
# 11 | 0
#
# I used craffel's draw_neural_net.py at https://gist.github.com/craffel/2d727968c3aaebd10359
#
#
# Date 2017/01/22
# http://www.thescinder.com
# Blog post https://thescinder.com/2017/01/24/teaching-a-machine-to-love-xor/
# Imports
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
# Input vector based on the truth table above
a0 = np.array([[0, 0],[0, 1],[1, 0],[1,1]])
#print(np.shape(a0))
# Target output
y = np.array([[0],[1],[1],[0]])
#print(np.shape(y))
def draw_neural_net(ax, left, right, bottom, top, layer_sizes,Theta0,Theta1):
'''
Public Gist from craffel
https://gist.github.com/craffel/2d727968c3aaebd10359
Draw a neural network cartoon using matplotilb.
:usage:
>>> fig = plt.figure(figsize=(12, 12))
>>> draw_neural_net(fig.gca(), .1, .9, .1, .9, [4, 7, 2])
:parameters:
– ax : matplotlib.axes.AxesSubplot
The axes on which to plot the cartoon (get e.g. by plt.gca())
– left : float
The center of the leftmost node(s) will be placed here
– right : float
The center of the rightmost node(s) will be placed here
– bottom : float
The center of the bottommost node(s) will be placed here
– top : float
The center of the topmost node(s) will be placed here
– layer_sizes : list of int
List of layer sizes, including input and output dimensionality
'''
n_layers = len(layer_sizes)
v_spacing = (top bottom)/float(max(layer_sizes))
h_spacing = (right left)/float(len(layer_sizes) 1)
# Nodes
for n, layer_size in enumerate(layer_sizes):
layer_top = v_spacing*(layer_size 1)/2. + (top + bottom)/2.
for m in range(layer_size):
circle = plt.Circle((n*h_spacing + left, layer_top m*v_spacing), v_spacing/4.,
color='#999999', ec='k', zorder=4)
ax.add_artist(circle)
# Edges
for n, (layer_size_a, layer_size_b) in enumerate(zip(layer_sizes[:1], layer_sizes[1:])):
layer_top_a = v_spacing*(layer_size_a 1)/2. + (top + bottom)/2.
layer_top_b = v_spacing*(layer_size_b 1)/2. + (top + bottom)/2.
for m in range(layer_size_a):
for o in range(layer_size_b):
if (n == 0):
line = plt.Line2D([n*h_spacing + left, (n + 1)*h_spacing + left],
[layer_top_a m*v_spacing, layer_top_b o*v_spacing], c='#8888dd',lw=Theta0[m,o])
elif (n == 1):
line = plt.Line2D([n*h_spacing + left, (n + 1)*h_spacing + left],
[layer_top_a m*v_spacing, layer_top_b o*v_spacing], c='#8888cc',lw=Theta1[m,o])
else:
line = plt.Line2D([n*h_spacing + left, (n + 1)*h_spacing + left],
[layer_top_a m*v_spacing, layer_top_b o*v_spacing], c='r')
ax.add_artist(line)
# Neuron functions
def sigmoid(x):
#The sigmoid function is 0.5 at 0, ~1 at infinity and ~0 at -infinity
#This is a good activation function for neural networks
mySig = 1/(1+np.exp(x))
return mySig
def sigmoidGradient(x):
#used for calculating the gradient at NN nodes to back-propagate during NN learning.
myDer = sigmoid(x)*(1sigmoid(x))
return myDer
# Initialize neural network connections with random values. This breaks symmetry so the network can learn.
np.random.seed(3)
Theta0 = 2*np.random.random((2,3))1
Theta1 = 2*np.random.random((3,1))1
# Train the network
myEpochs = 25000
m = np.shape(a0)[0]
# J is a vector we'll use to keep track of the error function as we learn
J = []
#set the learning rate
lR = 1e-1
#This is a weight penalty that keeps the
myLambda = 0#3e-2
fig, ax = plt.subplots(1,1,figsize=(12,12))
#plt.close(fig2)
fig2,ax2 = plt.subplots(1,2,figsize=(12,4))
for j in range(myEpochs):
# Forward propagation
z1 = np.dot(a0,Theta0)
a1 = sigmoid(z1)
#print(np.shape(a1))
z2 = np.dot(a1,Theta1)
a2 = sigmoid(z2)
# The error
E = (ya2)
J.append(np.mean(np.abs(E)))
# Back propagation
d2 = E.T
d1 = np.dot(Theta1,d2) * sigmoidGradient(z1.T)
Delta1 = 0*Theta1
Delta0 = 0*Theta0
for c in range(m):
Delta1 = Delta1 + np.dot(np.array([a1[c,:]]).T,np.array([d2[:,c]]))
Delta0 = Delta0 + np.dot(np.array([a0[c,:]]).T,np.array([d1[:,c]]))
w8Loss1 = myLambda * Theta1
w8Loss0 = myLambda * Theta0
#print(np.mean(Theta3))
Theta1Grad = Delta1/m + w8Loss1
Theta0Grad = Delta0/m + w8Loss0
Theta1 = Theta1 + Theta1Grad * lR #+ stoch1 * stochMultiplier
Theta0 = Theta0 + Theta0Grad * lR #+ stoch0 * stochMultiplier
if (j % 250 == 0):
#Save frames from the learning session
matplotlib.rcParams.update({'figure.titlesize': 42})
matplotlib.rcParams.update({'axes.titlesize': 24})
draw_neural_net(fig.gca(), .1,.9,.1,.9,[np.shape(a0)[1],np.shape(a1)[1],np.shape(a2)[1]],Theta0,Theta1)
#plt.figure(2)
plt.show()
fig.suptitle('Neural Network Iteration'+str(j))
fig.savefig('./trainingTLXOR/XORNN'+str(j))
fig.clf()
#fig2.subplot(121)
#plt.hold(True)
#plt.close(fig2)
ax2[0].plot(J,ls='-')#,çolor='#2222ee')
#plt
#plt.hold(True)
ax2[0].plot(j,J[j],'o')#,çolor='#1111ff')
#plt
ax2[0].axis([0,25000,0,0.6])
#fig2.suptitle('Mean Error')
#fig2.subplot(122)
#plt
ax2[1].plot([1],[1],'o',ms=32*a2[0,0])#,color='b',ms=a2[0,0])
#plt
ax2[1].plot([2],[1],'o',ms=32*a2[1,0])#,color='b',ms=a2[0,1])
#plt
ax2[1].plot([3],[1],'o',ms=32*a2[2,0])#,color='b',ms=a2[0,2])
#plt
ax2[1].plot([4],[1],'o',ms=32*a2[3,0])#,color='b',ms=a2[0,3])
ax2[1].axis([0,5,0,2])
ax2[0].set_title('Mean Error')
ax2[1].set_title('Outputs')
#suptitle('Mean Error and Output Vector')
plt.show()
fig2.savefig('./trainingTLXOR/XORNer'+str(j))
ax2[0].cla()
ax2[1].cla()
plt.close(fig2)
#ax2.cla()

view raw
trainNNXOR.py
hosted with ❤ by GitHub