I too built a rather decent deep learning rig for 900 quid

Skip to the components list
Skip to the benchmarks

Robert Heinlein’s 1957 Door into Summer returns throughout to a theme of knowing when it’s “time to railroad.” Loosely speaking this is the idea that one’s success comes as much from historical context as it does from innate ability, hard work, and luck (though much of the latter can be attributed to historical context).

Much of the concepts driving our modern AI renaissance are decades old at least- but the field lost steam as the computers were too slow and the Amazookles of the world were yet to use them to power their recommendation engines and so on. In the meantime computers have gotten much faster and much better at beating humans at very fancy games. Modern computers are now fast enough to make deep learning feasible, and it works for many problems as well as providing insight into how our own minds might work.

I too have seen the writing on the wall in recent years. I can say with some confidence that now is the time to railroad, and by “railroad” I mean revolutionise the world with artificial intelligence. A lot of things changed in big ways during the original “time to railroad,” the industrial revolution. For some this meant fortune and progress and for others, ruin. I would like to think that we are all a bit brighter than our old-timey counterparts were back then and we have the benefit of our history to learn from, so I’m rooting for an egalitarian utopia rather than an AI apocalypse. In any case, collective stewardship of the sea changes underway is important and this means the more people learn about AI the less likely the future will be decided solely by the technocratic elites of today.

I’ve completed a few MOOCs on machine learning in general and neural networks in particular, coded up some of the basic functions from scratch and I’m beginning to use some of the major libraries to investigate more interesting ideas. As I moved on from toy examples like MNIST and housing price prediction one thing became increasingly clear:

It took me a week of work to realize I was totally on the wrong track training a vision model meant to mimic cuttlefish perception on my laptop. This sort of wasted time really adds up, so I decided to go deeper and build my own GPU-enhanced deep learning rig.

Luckily there are lots of great guides out there as everyone and their grandmother is building their own DL rig at the moment. Most of the build guides have something along the lines of “. . . for xxxx monies” in the title, which makes it easier to match budgets. Build budgets run the gamut from the surprisingly capable $800 machine by Nick Condo to the serious $1700 machine by Slav Ivanov all the way up to the low low price of “under $5000” by Kunal Jain. I did not even read the last one because I am not made of money.

I am currently living in the UK, so that means I have to buy everything in pounds. The prices for components in pounds sterling are. . . pretty much the same as they are in greenbacks. The exchange rate to the British pound can be a bit misleading, even now that Brexit has crushed the pound sterling as well as our hopes and dreams. In my experience it seems like you can buy about the same for a pound at the store as for a dollar in the US or a euro on the continent. It seems like the only thing they use the exchange rate for is calculating salaries.

I’d recommend first visiting Tim Dettmers’ guide to choosing the right GPU for you. I’m in a stage of life where buying the “second cheapest” appropriate option is usually best. With a little additional background reading and following Tim’s guide, I selected the Nvidia GTX 1060 GPU with 6GB of memory. This was from Tim’s “I have little money” category, one up from the “I have almost no money” category, and in keeping with my life philosophy of the second-cheapest. Going to the next tier up is often close to twice as costly, but not close to twice as good. This holds for my choice of GPUs as well: a single 1070 is about twice the cost and around 50% or so faster than a 1060 However, two 1060s does get you pretty close to twice the performance, and that’s what I went with. As we’ll see in the benchmarks Tensorflow does make it pretty easy to take advantage of both GPUs, but doubling the capacity of my DLR by doubling the GPUs in the future won’t be plausible.

My upgradeability is somewhat limited by the number of threads (4) and PCIe lanes (16) of the modest i3 CPU I chose; if a near-term upgrade was a higher priority, I should have left out the second 1060 GPU and diverted that part of a budget to a better CPU (e.g. the Intel Xeon E5-1620 V4 recommended by Slav Ivanov). But if you’re shelling out so much for a higher-end system you’ll probably want a bigger GPU to start with, and it’s easy to see how one can go from a budget of $800 to $1700 rather quickly.

The rest of the computer’s job is to quickly dump data into the GPU memory without messing things up. I ended up using almost all the same components as those in Nick’s guide because, again, my physical makeup is meat rather than monetary in nature.

Here’s the full list of components. I sourced what I could from Amazon Warehouse Deals to try and keep the cost down.


GPU (x2): Gigabyte Nvidia GTX 1060 6GB (£205.78 each)
Motherboard: MSI Intel Z170 KRAIT-GAMING (£99.95)
CPU: Intel Core i3 6100 Skylake Dual-Core 3.7 GHz Processor (£94.58)
Memory: Corsair CMK16GX4M2A2400C14 Vengeance 2x8GB (1£05.78)
PSU: Corsair CP-9020078-UK Builder Series 750W CS750M ATX/EPS Semi-Modular 80 Plus Gold Power Supply Unit (£77.25)
Storage: SanDisk Ultra II SSD 240 GB SATA III (£72.18)
Case: Thermaltake Versa H23 (27.10)

Total: £888.40

I had never built a PC before and didn’t have any idea what I was doing. Luckily, Youtube did, and I didn’t even break anything when I slotted all the pieces together. I had an install thumb drive for Ubuntu 16.04 hanging around ready to go and consequently I was up and running relatively quickly.

The next step was installing the drivers and CUDA developer’s toolkit for the GPUs. I’ve been working mainly with Tensorflow lately, so I followed their guide to get everything ready to take advantage of the new setup. I am using Anaconda to manage Python environments for now, so I made one with tensorflow and another with tensorflow_gpu packages.

I decided to train on the CIFAR10 image classification dataset using this tutorial to test out the GPUs. I also wanted to see how fast training progresses on a project of mine, a two-category classifier for quantitative phase microscope images.

The CIFAR10 image classification tutorial from tensorflow.org was perfect because you can flag for the training to take place on one or two GPUs, or train on the CPU alone. It takes ~1.25 hours to train the first 10000 steps on the CPU, but only 4 minutes for the same training on one 1060. That’s a weeks-to-days/days-to-hours/hours-to-minutes level of speedup.

# CPU 10000 steps
2017-06-18 12:56:38.151978: step 0, loss = 4.68 (274.9 examples/sec; 0.466 sec/batch)
2017-06-18 12:56:42.815268: step 10, loss = 4.60 (274.5 examples/sec; 0.466 sec/batch)

2017-06-18 14:12:50.121319: step 9980, loss = 0.80 (283.0 examples/sec; 0.452 sec/batch)
2017-06-18 14:12:54.652866: step 9990, loss = 1.03 (282.5 examples/sec; 0.453 sec/batch)

# One GPU
2017-06-18 15:50:16.810051: step 0, loss = 4.67 (2.3 examples/sec; 56.496 sec/batch)
2017-06-18 15:50:17.678610: step 10, loss = 4.62 (6139.0 examples/sec; 0.021 sec/batch)
2017-06-18 15:50:17.886419: step 20, loss = 4.54 (6197.2 examples/sec; 0.021 sec/batch)

2017-06-18 15:54:00.386815: step 10000, loss = 0.96 (5823.0 examples/sec; 0.022 sec/batch)

# Two GPUs
2017-06-25 14:48:43.918359: step 0, loss = 4.68 (4.7 examples/sec; 27.362 sec/batch)
2017-06-25 14:48:45.058762: step 10, loss = 4.61 (10065.4 examples/sec; 0.013 sec/batch)

2017-06-25 14:52:28.510590: step 6000, loss = 0.91 (8172.5 examples/sec; 0.016 sec/batch)

2017-06-25 14:54:56.087587: step 9990, loss = 0.90 (6167.8 examples/sec; 0.021 sec/batch)

That’s about 21-32x speedup on the GPUs. Not quite double the speed on two GPUs because the model isn’t big enough to utilize all of both GPUs, as we can see in the output from nvidia-smi

# Training on one GPU

# Training on two GPUs

My own model had a similar speedup, going from training about one 79-image minibatch per second to training more than 30 per second. Trying to train this model on my laptop, a Microsoft Surface Book, I was getting about 0.75 steps a second. [Aside: the laptop does have a discrete GPU, a variant of the GeForce 940M, but no Linux driver that I’m aware of :/].

# Training on CPU only
INFO:tensorflow:global_step/sec: 0.981465
INFO:tensorflow:loss = 0.673449, step = 173 (101.889 sec)
INFO:tensorflow:global_step/sec: 0.994314
INFO:tensorflow:loss = 0.64968, step = 273 (100.572 sec)

# Dual GPUs
INFO:tensorflow:global_step/sec: 30.3432
INFO:tensorflow:loss = 0.317435, step = 90801 (3.296 sec)
INFO:tensorflow:global_step/sec: 30.6238
INFO:tensorflow:loss = 0.272398, step = 90901 (3.265 sec)
INFO:tensorflow:global_step/sec: 30.5632
INFO:tensorflow:loss = 0.327474, step = 91001 (3.272 sec)
INFO:tensorflow:global_step/sec: 30.5643
INFO:tensorflow:loss = 0.43074, step = 91101 (3.272 sec)
INFO:tensorflow:global_step/sec: 30.6085

Overall I’m pretty satisfied with the setup, and I’ve got a lot of cool projects to try out on it. Getting the basics for machine learning is pretty easy with all the great MOOCs and tutorials out there, but the learning curve slopes sharply upward after that. Working directly on real projects with a machine that can train big models before the heat-death of the universe is essential for gaining intuition and tackling cool problems.

Advertisements

Trolling a Neural Network to Learn About Color Cues

Neural networks are breaking into new fields and refining roles in old ones on a day-to-day basis. The main enabling breakthrough in recent years is the ability to efficiently train networks consisting of many stacked layers of artificial neurons. These deep learning networks have been used for everything from tomographic phase microscopy to learning to generate speech from scratch.

A particularly fun example of a deep neural net comes in the form of one @ColorizeBot, a twitter bot that generates color images from black and white photographs. For landscapes, portraits, and street photography the results are reasonably realistic, even if they do fall into an uncanny valley that is eery, striking, and often quite beautiful. I decided to try and trick @ColorizeBot to learn something about how it was trained and regularized, and maybe gain some insights into general color cues. First, a little background on how @ColorizeBot might be put together.

According to the description on @ColorizeBot’s Twitter page:

I hallucinate colors into any monochrome image. I consist of several ConvNets and have been trained on millions of images to recognize things and color them.

This tells us that CB is indeed an artificial neural network with many layers, some of which consist of convolutional layers. These would be sharing weights and give deep learning the ability to discover features from images rather than relying on a conventional machine vision approach of manual extraction of image features to train an algorithm. This gives CB the ability to discover important indicators of color that their handler wouldn’t necessarily have thought of in the first place. I expect CB was trained as a special type of autoencoder. Normally, an autoencoding neural network has the same data on both the input and output side and iteratively tries to reproduce the input at the output in an efficient manner. In this case instead of producing a single grayscale image at the output, the network would need to produce three versions, one image each for red, green, and blue color channels. Of course, it doesn’t make sense to totally throw away the structure of the black and white image and the way the authors include this a priori knowledge to inform the output must have been important for getting the technique to work well and fast. CB’s twitter bio claims to have trained on millions of photos, and I tried to trick it into making mistakes and revealing something about it’s inner workings and training data. To do this, I took some photos I thought might yield interesting results, converted them to grayscale, and sent them to @ColorizeBot.

The first thing I wanted to try is a classic teaching example from black and white photography. If you have ever thought about dusting off a vintage medium format rangefinder and turning your closet into a darkroom, you probably know that a vibrant sun-kissed tomato on a bed of crisp greens looks decidedly bland on black and white film. If one wishes to pursue the glamorous life of a hipster salad photograher, it’s important to invest in a few color filters to distinguish red and green. In general, red tomatoes and green salad leaves have approximately the same luminance (i.e. brightness) values. I wrote about how this example might look through the unique eyes of cephalapods, which can perceive color with only one color type of photoreceptor. Our own visual system can only see contrast between the two types of object by their color, but if a human viewer looks at a salad in a dark room (what? midnight is a perfectly reasonable time for salad), they can still tell what is and is not a tomato without distinguishing the colors. @ColorizeBot interprets a B&W photo of cherry tomatoes on spinach leaves as follows:

c2sel44vqaagemw-jpg-large

This scene is vaguely plausible. After all, it some people may prefer salads with unripe tomatoes. Perhaps meal-time photos from these people’s social media feeds made it into the training data for @ColorizeBot. What is potentially more interesting is that this test image revealed a spatial dependence- the tomatoes in the corner were correctly filled in with a reddish hue, while those in the center remain green. Maybe this has something to do with how salad images used to train the bot were framed. Alternatively, it could be that the abundance of leaves surrounding the central tomatoes provide a confusing context and CB is used to recognizing more isolated round objects as tomatoes. In any case it does know enough to guess than spinach is green and some cherry tomatoes are reddish.

Next I decided to try and deliberately evoke evidence of overfitting with an Ishihara test. These are the mosaic images of dots with colored numbers written in the pattern. If @ColorizeBot scraped public images from the internet for some of its training images, it probably came across Ishihara tests. If the colorizer expects to see some sort of numbers (or any patterned color variation) in a circle of dots that looks like a color-blindness test, it’s probably overfitting; the black and white image by design doesn’t give any clues about color variation.

c2se-teveae2_ay-jpg-large

That one’s a pass. The bot filled in the flyer with a bland brown coloration, but didn’t overfit by dreaming up color variation in the Ishihara test. This tells us that even though there’s a fair chance the neural net may have seen an imagef like this before, it doesn’t expect it every time it sees a flat pattern of circles. CB has also learned to hedge its bets when looking at a box of of colored pencils, which could conceivably be a box of brown sketching pencils.

c2seviwviaa87xo-jpg-large

What about a more typical type of photograph? Here’s an old truck in some snow:

c2scawfveaallw4-jpg-large

CB managed to correctly interpret the high-albedo snow as white (except where it was confused by shadows), and, although it made the day out to be a bit sunnier than it actually was, most of the winter grass was correctly interpreted as brown. But have a look on the right hand side of the photo, where apparently CB decided the seasons changed to a green spring in the time it takes to scan your eyes across the image. This is the sort of surreal, uncanny effect that CB is capable of. It’s more pronounced, and sometimes much more aesthetic, on some of the fancier photos on CB’s Twitter feed. The seasonal transformation from one side of the photo tells us something about the limits of CB’s interpretation of context.

In a convolutional neural network, each part of an input image is convolved with kernels of a limited size, and the influence of one part of the image on its neighbors is limited to some degree by the size of the largest kernels. You can think of these convolutional kernels as smaller sub-images that are applied to the full image as a moving filter, and they are a foundational component of the ability of deep neural networks to discover features, like edges and orientations, without being explicitly told what to look for. The results of these convolutional layers propagate deeper through the network, where the algorithm can make increasingly complex connections between aspects of the image.

In the snowy truck and in the tomato/spinach salad examples, we were able to observe @ColorizeBot’s ability to change it’s interpretation of the same sort of objects across a single field of view. If you, fellow human, or myself see an image that looks like it was taken in winter, we include in our expectations “This photo looks like it was taken in winter, so it is likely the whole scene takes place in winter because that’s how photographs and time tends to work.” Likewise, we might find it strange for someone to have a preference for unripe tomatoes, but we’d find it even stranger for someone to prefer a mixture of ripe-ish and unripe tomatoes on the same salad. Maybe the salad maker was an impatient type suffering from a tomato shortage, but given a black and white photo that wouldn’t be my first guess on how it came to be based on the way most of the salads I’ve seen throughout my life have been constructed. In general we don’t see deep neural networks like @Colorizebot generalizing that far quite yet, and the resulting sense of context can be limited. This is different from generative networks like Google’s “Inception” or style transfer systems like Deepart.io, which perfuse an entire scene with a cohesive theme (even if that theme is “everything is made of duck’s eyes”).

Finally, what does CB think of theScinder’s logo image? It’s a miniature magnetoplasmadynamic thruster built out of a camera flash and magnet wire. Does CB have any prior experience with esoteric desktop plasma generators?

c29xshxviaa2_g3

That’ll do CB, that’ll do.

Can’t get enough machine learning? Check out my other essays on the topic

@ColorizeBot’s Twitter feed

@CtheScinder’s Twitter feed

All the photographs used in this essay were taken by yours truly, (at http://www.thescinder.com), and all images were colorized by @ColorizeBot.

And finally, here’s the color-to-B&W-to-color transformation for the tomato spinach photo:

tomatotrickery