I too built a rather decent deep learning rig for 900 quid

Skip to the components list
Skip to the benchmarks

Robert Heinlein’s 1957 Door into Summer returns throughout to a theme of knowing when it’s “time to railroad.” Loosely speaking this is the idea that one’s success comes as much from historical context as it does from innate ability, hard work, and luck (though much of the latter can be attributed to historical context).

Much of the concepts driving our modern AI renaissance are decades old at least- but the field lost steam as the computers were too slow and the Amazookles of the world were yet to use them to power their recommendation engines and so on. In the meantime computers have gotten much faster and much better at beating humans at very fancy games. Modern computers are now fast enough to make deep learning feasible, and it works for many problems as well as providing insight into how our own minds might work.

I too have seen the writing on the wall in recent years. I can say with some confidence that now is the time to railroad, and by “railroad” I mean revolutionise the world with artificial intelligence. A lot of things changed in big ways during the original “time to railroad,” the industrial revolution. For some this meant fortune and progress and for others, ruin. I would like to think that we are all a bit brighter than our old-timey counterparts were back then and we have the benefit of our history to learn from, so I’m rooting for an egalitarian utopia rather than an AI apocalypse. In any case, collective stewardship of the sea changes underway is important and this means the more people learn about AI the less likely the future will be decided solely by the technocratic elites of today.

I’ve completed a few MOOCs on machine learning in general and neural networks in particular, coded up some of the basic functions from scratch and I’m beginning to use some of the major libraries to investigate more interesting ideas. As I moved on from toy examples like MNIST and housing price prediction one thing became increasingly clear:

It took me a week of work to realize I was totally on the wrong track training a vision model meant to mimic cuttlefish perception on my laptop. This sort of wasted time really adds up, so I decided to go deeper and build my own GPU-enhanced deep learning rig.

Luckily there are lots of great guides out there as everyone and their grandmother is building their own DL rig at the moment. Most of the build guides have something along the lines of “. . . for xxxx monies” in the title, which makes it easier to match budgets. Build budgets run the gamut from the surprisingly capable $800 machine by Nick Condo to the serious $1700 machine by Slav Ivanov all the way up to the low low price of “under $5000” by Kunal Jain. I did not even read the last one because I am not made of money.

I am currently living in the UK, so that means I have to buy everything in pounds. The prices for components in pounds sterling are. . . pretty much the same as they are in greenbacks. The exchange rate to the British pound can be a bit misleading, even now that Brexit has crushed the pound sterling as well as our hopes and dreams. In my experience it seems like you can buy about the same for a pound at the store as for a dollar in the US or a euro on the continent. It seems like the only thing they use the exchange rate for is calculating salaries.

I’d recommend first visiting Tim Dettmers’ guide to choosing the right GPU for you. I’m in a stage of life where buying the “second cheapest” appropriate option is usually best. With a little additional background reading and following Tim’s guide, I selected the Nvidia GTX 1060 GPU with 6GB of memory. This was from Tim’s “I have little money” category, one up from the “I have almost no money” category, and in keeping with my life philosophy of the second-cheapest. Going to the next tier up is often close to twice as costly, but not close to twice as good. This holds for my choice of GPUs as well: a single 1070 is about twice the cost and around 50% or so faster than a 1060 However, two 1060s does get you pretty close to twice the performance, and that’s what I went with. As we’ll see in the benchmarks Tensorflow does make it pretty easy to take advantage of both GPUs, but doubling the capacity of my DLR by doubling the GPUs in the future won’t be plausible.

My upgradeability is somewhat limited by the number of threads (4) and PCIe lanes (16) of the modest i3 CPU I chose; if a near-term upgrade was a higher priority, I should have left out the second 1060 GPU and diverted that part of a budget to a better CPU (e.g. the Intel Xeon E5-1620 V4 recommended by Slav Ivanov). But if you’re shelling out so much for a higher-end system you’ll probably want a bigger GPU to start with, and it’s easy to see how one can go from a budget of $800 to $1700 rather quickly.

The rest of the computer’s job is to quickly dump data into the GPU memory without messing things up. I ended up using almost all the same components as those in Nick’s guide because, again, my physical makeup is meat rather than monetary in nature.

Here’s the full list of components. I sourced what I could from Amazon Warehouse Deals to try and keep the cost down.


GPU (x2): Gigabyte Nvidia GTX 1060 6GB (£205.78 each)
Motherboard: MSI Intel Z170 KRAIT-GAMING (£99.95)
CPU: Intel Core i3 6100 Skylake Dual-Core 3.7 GHz Processor (£94.58)
Memory: Corsair CMK16GX4M2A2400C14 Vengeance 2x8GB (1£05.78)
PSU: Corsair CP-9020078-UK Builder Series 750W CS750M ATX/EPS Semi-Modular 80 Plus Gold Power Supply Unit (£77.25)
Storage: SanDisk Ultra II SSD 240 GB SATA III (£72.18)
Case: Thermaltake Versa H23 (27.10)

Total: £888.40

I had never built a PC before and didn’t have any idea what I was doing. Luckily, Youtube did, and I didn’t even break anything when I slotted all the pieces together. I had an install thumb drive for Ubuntu 16.04 hanging around ready to go and consequently I was up and running relatively quickly.

The next step was installing the drivers and CUDA developer’s toolkit for the GPUs. I’ve been working mainly with Tensorflow lately, so I followed their guide to get everything ready to take advantage of the new setup. I am using Anaconda to manage Python environments for now, so I made one with tensorflow and another with tensorflow_gpu packages.

I decided to train on the CIFAR10 image classification dataset using this tutorial to test out the GPUs. I also wanted to see how fast training progresses on a project of mine, a two-category classifier for quantitative phase microscope images.

The CIFAR10 image classification tutorial from tensorflow.org was perfect because you can flag for the training to take place on one or two GPUs, or train on the CPU alone. It takes ~1.25 hours to train the first 10000 steps on the CPU, but only 4 minutes for the same training on one 1060. That’s a weeks-to-days/days-to-hours/hours-to-minutes level of speedup.

# CPU 10000 steps
2017-06-18 12:56:38.151978: step 0, loss = 4.68 (274.9 examples/sec; 0.466 sec/batch)
2017-06-18 12:56:42.815268: step 10, loss = 4.60 (274.5 examples/sec; 0.466 sec/batch)

2017-06-18 14:12:50.121319: step 9980, loss = 0.80 (283.0 examples/sec; 0.452 sec/batch)
2017-06-18 14:12:54.652866: step 9990, loss = 1.03 (282.5 examples/sec; 0.453 sec/batch)

# One GPU
2017-06-18 15:50:16.810051: step 0, loss = 4.67 (2.3 examples/sec; 56.496 sec/batch)
2017-06-18 15:50:17.678610: step 10, loss = 4.62 (6139.0 examples/sec; 0.021 sec/batch)
2017-06-18 15:50:17.886419: step 20, loss = 4.54 (6197.2 examples/sec; 0.021 sec/batch)

2017-06-18 15:54:00.386815: step 10000, loss = 0.96 (5823.0 examples/sec; 0.022 sec/batch)

# Two GPUs
2017-06-25 14:48:43.918359: step 0, loss = 4.68 (4.7 examples/sec; 27.362 sec/batch)
2017-06-25 14:48:45.058762: step 10, loss = 4.61 (10065.4 examples/sec; 0.013 sec/batch)

2017-06-25 14:52:28.510590: step 6000, loss = 0.91 (8172.5 examples/sec; 0.016 sec/batch)

2017-06-25 14:54:56.087587: step 9990, loss = 0.90 (6167.8 examples/sec; 0.021 sec/batch)

That’s about 21-32x speedup on the GPUs. Not quite double the speed on two GPUs because the model isn’t big enough to utilize all of both GPUs, as we can see in the output from nvidia-smi

# Training on one GPU

# Training on two GPUs

My own model had a similar speedup, going from training about one 79-image minibatch per second to training more than 30 per second. Trying to train this model on my laptop, a Microsoft Surface Book, I was getting about 0.75 steps a second. [Aside: the laptop does have a discrete GPU, a variant of the GeForce 940M, but no Linux driver that I’m aware of :/].

# Training on CPU only
INFO:tensorflow:global_step/sec: 0.981465
INFO:tensorflow:loss = 0.673449, step = 173 (101.889 sec)
INFO:tensorflow:global_step/sec: 0.994314
INFO:tensorflow:loss = 0.64968, step = 273 (100.572 sec)

# Dual GPUs
INFO:tensorflow:global_step/sec: 30.3432
INFO:tensorflow:loss = 0.317435, step = 90801 (3.296 sec)
INFO:tensorflow:global_step/sec: 30.6238
INFO:tensorflow:loss = 0.272398, step = 90901 (3.265 sec)
INFO:tensorflow:global_step/sec: 30.5632
INFO:tensorflow:loss = 0.327474, step = 91001 (3.272 sec)
INFO:tensorflow:global_step/sec: 30.5643
INFO:tensorflow:loss = 0.43074, step = 91101 (3.272 sec)
INFO:tensorflow:global_step/sec: 30.6085

Overall I’m pretty satisfied with the setup, and I’ve got a lot of cool projects to try out on it. Getting the basics for machine learning is pretty easy with all the great MOOCs and tutorials out there, but the learning curve slopes sharply upward after that. Working directly on real projects with a machine that can train big models before the heat-death of the universe is essential for gaining intuition and tackling cool problems.

A Skeptic Over Coffee: Young Blood Part Duh

Does this cloudy liquid hold the secret to vitality in your first 100 years and beyond? I can’t say for sure that it doesn’t. What I can say is that I would happily sell it to you for $8,000.

Next time someone tries to charge you a premium to intravenously imbibe someone else’s blood plasma, you have my permission to tell them no thanks. Unless there’s a chance that it is fake, then it might be worth doing.

Californian company Ambrosia LLC has been making the rounds in publications like the New Scientist hype-machine to promote claims that their plasma transfusions show efficacy at treating symptomatic biomarkers of aging. Set up primarily to exploit rich people by exploiting younger, poorer people on the off chance that the Precious Bodily Fluids of the latter will invigorate the former, the small biotech firm performed a tiny study of over-35s receiving blood plasma transfusions from younger people. It’s listed on clinicaltrials.gov and everything.

First of all, to determine the efficacy of a treatment it’s important that both the doctors and the patients are blinded to whether they are administering/being administered the active therapeutic. That goes all the way up the line from the responsible physician to the phlebotomist to the statistician analyzing the data. But to blind patients and researchers the study must include a control group receiving a placebo treatment, which in this case there was not. So it’s got that going for it.

To be fair, this isn’t actually bad science. For that to be true, it would have to be actual science. Not only does a study like this require a control to account for any placebo effect*, but the changes reported for the various biomarkers may be well within common fluctuations.

Finally, remember that if you assess 20 biomarkers with the common confidence cutoff of p=0.05, chances are one of the twenty will show a statistical difference from baseline. That is the definition of a p-value at that level: a 1 in 20 chance of a difference being down to random chance. Quartz reports the Ambrosia study looked at about 100 different biomarkers and mentions positive changes in 3 of them. I don’t know if they performed statistical tests at a cutoff level of 0.05, but if so you should expect on average 5 of 100 biomarkers in a screen to show a statistical difference. This isn’t the first case of questionable statistics selling fountain of youth concepts.

All of this is not to say that the experiments disprove the positive effects of shooting up teenage PBFs. It also generated zero conclusive evidence against the presence of a large population of English teapots in erratic orbits around Saturn.

You could conclude by saying “more scientific investigation is warranted” but that would imply the work so far was science.

* The placebo effect can even apply to as seemingly objective a treatment as surgery. Take this 2013 study that found no statistical difference in the outcomes of patients with knee problems treated with either arthroscopic surgery or a surgeon pretending to perform the surgery.

I

​What the cornerstone of any futuristic transportation mix should be.

The future has always promised exciting new forms of transport for the bustling hither and thither of our undoubtedly jumpsuit-wearing, cyborg selves. From the outlandish (flying cars) to the decidedly practical (electric cars), a better way of getting about is always just around the corner. Workers in the United States spend about 26 minutes twice a day on their commutes, and for most people this means driving. What’s worse, the negative effect of a long commute on life satisfaction is consistently under-estimated. Premature deaths in the United States due to automobile accidents and air pollution from vehicles are about 33,000 and an estimated 58,000 yearly, respectively. Add in all the costs associated with car ownership and road maintenance (not to mention the incalculable cost of automobiles’ contribution to the potentially existential threat of climate change) and the picture becomes clear: cars aren’t so much a convenient means of conveyance serving the humans they carry, but rather a demanding taskmaster that may be the doom of us all. There must be something better awaiting us in the transportation wonders of tomorrow.

What if we came up with a transportation mode that is faster than taking the bus, costs less than driving, and improves lifespan? What if it also happened to be the most efficient means of transport known? Anything offering up that long list of pros should be a centerpiece of any transportation blend, what wonder of future technology could I possibly be talking about?

I’m writing, of course, about the humble bicycle.

Prioritizing exotic transportation projects like Elon Musk’s hyperloop is like inventing a new type of ladder to reach the highest branches, all the while surrounded on all sides by drooping boughs laden with low-hanging fruit. In a great example of working harder, not smarter, city planners in the U.S. strive tirelessly to please our automobile overlords. Everyone needs a car to get to work and the supermarket, because everything is far apart, and everything is so far apart because everyone drives everywhere anyway. All the parking spaces and wide lanes push everything even further apart in a commuting nightmare feedback loop.

It doesn’t have to be that way, and it’s not too late to change. Consider the famously bikeable infrastructure of the Netherlands, where the bicycle is king. Many people take the purpose-built bike lanes for granted and assume they’ve always been there, but in fact they are a result of deliberate activism leading to a broad change in transportation policy beginning in the seventies. Likewise, the servile relationship many U.S. cities maintain with cars is not set in stone, and, contrary to popular belief, fuel taxes and registration fees don’t cover the costs

Even if every conventional automobile was replaced tomorrow with a self-driving electric car, a bicycle would still be a more efficient choice. The reason comes down to simple physics: a typical bike’s ~10 kgs is a fraction of the mass of the average rider, so most of the energy delivered to the pedals goes toward moving the human cargo. A car (even a Tesla) has to waste most of its energy moving the car itself. The only vehicle that has a chance of besting the bicycle in terms of efficiency is an electric-assist bicycle, once you factor in the total energy costs of producing and shipping the human fuel (food), but even that depends on where you buy your groceries [pdf].

Bicycles have been around in more or less modern form for over a hundred years, but the right tool isn’t necessarily the newest. The law of parsimony posits that the simplest solution that suffices is generally the best, and for many of our basic transport needs that means a bicycle. It’s about time we started affording cycling the respect it deserves as a central piece of our future cities and towns. Your future transportation experience may mean you’ll go to the office in virtual reality, meet important clients by hybrid dirigible, and ship supplies to Mars by electric rocket, but you’ll pick up the groceries by bicycle on the way home from the station.

Image sources used for illustrations:

Fat bike CC SA BY Sylenius

Public Domain:

Tire tracks

Lunar lander module>

Apollo footprint

Trolling a Neural Network to Learn About Color Cues

Neural networks are breaking into new fields and refining roles in old ones on a day-to-day basis. The main enabling breakthrough in recent years is the ability to efficiently train networks consisting of many stacked layers of artificial neurons. These deep learning networks have been used for everything from tomographic phase microscopy to learning to generate speech from scratch.

A particularly fun example of a deep neural net comes in the form of one @ColorizeBot, a twitter bot that generates color images from black and white photographs. For landscapes, portraits, and street photography the results are reasonably realistic, even if they do fall into an uncanny valley that is eery, striking, and often quite beautiful. I decided to try and trick @ColorizeBot to learn something about how it was trained and regularized, and maybe gain some insights into general color cues. First, a little background on how @ColorizeBot might be put together.

According to the description on @ColorizeBot’s Twitter page:

I hallucinate colors into any monochrome image. I consist of several ConvNets and have been trained on millions of images to recognize things and color them.

This tells us that CB is indeed an artificial neural network with many layers, some of which consist of convolutional layers. These would be sharing weights and give deep learning the ability to discover features from images rather than relying on a conventional machine vision approach of manual extraction of image features to train an algorithm. This gives CB the ability to discover important indicators of color that their handler wouldn’t necessarily have thought of in the first place. I expect CB was trained as a special type of autoencoder. Normally, an autoencoding neural network has the same data on both the input and output side and iteratively tries to reproduce the input at the output in an efficient manner. In this case instead of producing a single grayscale image at the output, the network would need to produce three versions, one image each for red, green, and blue color channels. Of course, it doesn’t make sense to totally throw away the structure of the black and white image and the way the authors include this a priori knowledge to inform the output must have been important for getting the technique to work well and fast. CB’s twitter bio claims to have trained on millions of photos, and I tried to trick it into making mistakes and revealing something about it’s inner workings and training data. To do this, I took some photos I thought might yield interesting results, converted them to grayscale, and sent them to @ColorizeBot.

The first thing I wanted to try is a classic teaching example from black and white photography. If you have ever thought about dusting off a vintage medium format rangefinder and turning your closet into a darkroom, you probably know that a vibrant sun-kissed tomato on a bed of crisp greens looks decidedly bland on black and white film. If one wishes to pursue the glamorous life of a hipster salad photograher, it’s important to invest in a few color filters to distinguish red and green. In general, red tomatoes and green salad leaves have approximately the same luminance (i.e. brightness) values. I wrote about how this example might look through the unique eyes of cephalapods, which can perceive color with only one color type of photoreceptor. Our own visual system can only see contrast between the two types of object by their color, but if a human viewer looks at a salad in a dark room (what? midnight is a perfectly reasonable time for salad), they can still tell what is and is not a tomato without distinguishing the colors. @ColorizeBot interprets a B&W photo of cherry tomatoes on spinach leaves as follows:

c2sel44vqaagemw-jpg-large

This scene is vaguely plausible. After all, it some people may prefer salads with unripe tomatoes. Perhaps meal-time photos from these people’s social media feeds made it into the training data for @ColorizeBot. What is potentially more interesting is that this test image revealed a spatial dependence- the tomatoes in the corner were correctly filled in with a reddish hue, while those in the center remain green. Maybe this has something to do with how salad images used to train the bot were framed. Alternatively, it could be that the abundance of leaves surrounding the central tomatoes provide a confusing context and CB is used to recognizing more isolated round objects as tomatoes. In any case it does know enough to guess than spinach is green and some cherry tomatoes are reddish.

Next I decided to try and deliberately evoke evidence of overfitting with an Ishihara test. These are the mosaic images of dots with colored numbers written in the pattern. If @ColorizeBot scraped public images from the internet for some of its training images, it probably came across Ishihara tests. If the colorizer expects to see some sort of numbers (or any patterned color variation) in a circle of dots that looks like a color-blindness test, it’s probably overfitting; the black and white image by design doesn’t give any clues about color variation.

c2se-teveae2_ay-jpg-large

That one’s a pass. The bot filled in the flyer with a bland brown coloration, but didn’t overfit by dreaming up color variation in the Ishihara test. This tells us that even though there’s a fair chance the neural net may have seen an imagef like this before, it doesn’t expect it every time it sees a flat pattern of circles. CB has also learned to hedge its bets when looking at a box of of colored pencils, which could conceivably be a box of brown sketching pencils.

c2seviwviaa87xo-jpg-large

What about a more typical type of photograph? Here’s an old truck in some snow:

c2scawfveaallw4-jpg-large

CB managed to correctly interpret the high-albedo snow as white (except where it was confused by shadows), and, although it made the day out to be a bit sunnier than it actually was, most of the winter grass was correctly interpreted as brown. But have a look on the right hand side of the photo, where apparently CB decided the seasons changed to a green spring in the time it takes to scan your eyes across the image. This is the sort of surreal, uncanny effect that CB is capable of. It’s more pronounced, and sometimes much more aesthetic, on some of the fancier photos on CB’s Twitter feed. The seasonal transformation from one side of the photo tells us something about the limits of CB’s interpretation of context.

In a convolutional neural network, each part of an input image is convolved with kernels of a limited size, and the influence of one part of the image on its neighbors is limited to some degree by the size of the largest kernels. You can think of these convolutional kernels as smaller sub-images that are applied to the full image as a moving filter, and they are a foundational component of the ability of deep neural networks to discover features, like edges and orientations, without being explicitly told what to look for. The results of these convolutional layers propagate deeper through the network, where the algorithm can make increasingly complex connections between aspects of the image.

In the snowy truck and in the tomato/spinach salad examples, we were able to observe @ColorizeBot’s ability to change it’s interpretation of the same sort of objects across a single field of view. If you, fellow human, or myself see an image that looks like it was taken in winter, we include in our expectations “This photo looks like it was taken in winter, so it is likely the whole scene takes place in winter because that’s how photographs and time tends to work.” Likewise, we might find it strange for someone to have a preference for unripe tomatoes, but we’d find it even stranger for someone to prefer a mixture of ripe-ish and unripe tomatoes on the same salad. Maybe the salad maker was an impatient type suffering from a tomato shortage, but given a black and white photo that wouldn’t be my first guess on how it came to be based on the way most of the salads I’ve seen throughout my life have been constructed. In general we don’t see deep neural networks like @Colorizebot generalizing that far quite yet, and the resulting sense of context can be limited. This is different from generative networks like Google’s “Inception” or style transfer systems like Deepart.io, which perfuse an entire scene with a cohesive theme (even if that theme is “everything is made of duck’s eyes”).

Finally, what does CB think of theScinder’s logo image? It’s a miniature magnetoplasmadynamic thruster built out of a camera flash and magnet wire. Does CB have any prior experience with esoteric desktop plasma generators?

c29xshxviaa2_g3

That’ll do CB, that’ll do.

Can’t get enough machine learning? Check out my other essays on the topic

@ColorizeBot’s Twitter feed

@CtheScinder’s Twitter feed

All the photographs used in this essay were taken by yours truly, (at http://www.thescinder.com), and all images were colorized by @ColorizeBot.

And finally, here’s the color-to-B&W-to-color transformation for the tomato spinach photo:

tomatotrickery

Teaching a Machine to Love  XOR

xorsketch

The XOR function outputs true if one of the two inputs are true

The exclusive or function, also known as XOR (but never going by both names simultaneously), has a special relationship to artificial intelligence in general, and neural networks in particular. This is thanks to a prominent book from 1969 by Marvin Minsky and Seymour Papert entitled “Perceptrons: an Introduction to Computational Geometry.” Depending on who you ask, this text was single-handedly responsible for the AI winter due to its critiques of the state of the art neural network of the time. In an alternative view, few people ever actually read the book but everyone heard about it, and the tendency was to generalize a special-case limitation of local and single-layer perceptrons to the point where interest and funding for neural networks evaporated. In any case, thanks to back-propagation, neural networks are now in widespread use and we can easily train a three-layer neural network to replicate the XOR function.

In words, the XOR function is true for two inputs if one of them, but not both, is true. When you plot the XOR as a graph, it becomes obvious why the early perceptron would have trouble getting it right more than half the time.

sketch2dxor

There’s not a way to draw a straight 2D line on the graph and separate the true and false outputs for XOR, red and green in the sketch above. Go ahead and try. The same is going to be true trying to use a plane to separate a 3D version and so on to higher dimensions.

sketch3dxor

That’s a problem because a single layer perceptron can only classify points linearly. But if we allow ourselves a curved boundary, we can separate the true and false outputs easily, which is exactly what we get by adding a hidden layer to a neural network.

xorwhiddenlayer

The truth-table for XOR is as follows:

Input Output
00 0
01 1
10 1
11 0

If we want to train a neural network to replicate the table above, we use backpropagation to flow the output errors backward through the network based on the neuron activations of each node. Based on the gradient of these activations and the error in the layer immediately above, the network error can be optimized by something like gradient descent. As a result, our network can now be taught to represent a non-linear function. For a network with two inputs, three hidden units, and one output the training might go something like this:

trainingxor

Update (2017/03/02) Here’s the gist for making the gif above:

Things to Think About From 2016

thescinder2016wordle

A word cloud of theScinder’s output for 2016, made with wordle.net

CRISPR/Cas9

This subject includes throwbacks to 2015, when I did most of my writing about CRISPR/Cas9. That’s not to say 2016 didn’t contain any major genetic engineering news. In particular scientists are continue to move ahead with the genetic modification of human embryos.

If you feel like I did before I engaged in some deeper background reading, you can catch up with my notes on the basics. I used the protein structures for existing gene-editing techniques to highlight the differences between the old-school gene editing techniques and editing with cas9. I also compared the effort it takes to modify a genome with cas9 to how difficult it was using zinc-finger nucleases, the previous state-of-the-art (spoiler: it amounts to days of difference).

TLDR: The advantage of genetic engineering with Cas9 over previous methods is the difference between writing out a sequence of letters and solving complex molecular binding problems.

aLIGO and the detection of gravitational waves

Among the most impressive scientific breakthroughs of the previous hundred years or so, a bunch of clever people with very sensitive machines announced they’ve detected the squidge-squodging of space. A lot of the LIGO data is available from the LIGO Open Science Center, and this is a great way to learn signal processing techniques in Python. I synchronized the sound of gravitational wave chirp GW150914 to a simulated visualization (from SXS) of a corresponding black hole inspiral and the result is the following video. You can read my notes about the process here. I also modified the chirp to play the first few notes of the “Super Mario Brothers” theme.

Machine Learning

I’ve just started an intensive study of the subject, but machine learning continues to dip its toes into everything to do with modern human life. We have a lot of experience with meat-based learning programs, which should give us some insight into how to avoid common pitfalls. The related renewed interest in artificial intelligence should make the next few years interesting. If we do end up with a “hard” general artificial intelligence sometime soon, it might make competition a bit tough, if you could call it competition at all.

Devote a few seconds of thought to the twin issues of privacy and data ownership.

Mars

2016 also marked a renewed interest in manned space exploration, largely because of the announcement from space enthusiast Elon Musk that he’s really stoked to send a few people to Mars. NASA is still interested in Mars as well, and might be a good partner to temper Musk’s enthusiasm. In the Q&A at about 1:21 in the video below, Musk seems to suggest a willingness to die as the primary prerequisite for his first batch of settlers. There’s some known unavoidable and unknown unknowable dangers in the venture, but de-prioritizing survivability as a mission constraint runs a better chance of delaying manned exploration as long as it remains as expensive as Musk optimistically expects.

Here’s some stuff that’s a little a lot less serious about living on Mars.

It doesn’t grab the headlines with such vigor, but Jeff Bezo’s Blue Origins had an impressive year: retiring their first rocket after five flights and exceeding the mission design in a final test of a launch escape system.
Blue Origin is also working on an orbital launch system called New Glenn, in honor of the first astronaut from the USA to orbit the earth.

In that case, where are we headed?

The previous year provided some exciting moments to really trip the synapses, but we had some worrying turns as well. The biggest challenges of the next few decades will all have technical components, and understanding them doesn’t come for free. Humanity is learning more about biology at more fundamental levels, and medicine won’t look the same in ten years. A lot of people seem unconcerned that we probably won’t make the 2 degrees Celsius threshold for limiting climate change, although not worrying about something doesn’t mean it won’t kill anyone. Scientists and engineers have been clever enough to develop machine learners to assist our curiosity, and it’s exciting to think that resurgent interest in AI might give us someone to talk to soon. Hopefully they’ll be better conversationalists than the currently available chatbots, and a second opinion on the nature of the universe could be useful. It’s not going to be easy to keep up with improving automation, and humans will have to think about what working means to them.

Take some time to really dig into these subjects. You probably already have some thoughts and opinions on some of them, so try to read a contrary take. If you can’t think of evidence that might change your mind, you don’t deserve your conclusions.

Remember that science, technological development, and innovation have a much larger long-term effect on humans and our place in the universe than the petty machinations of human fractionation. So keep learning, figure out something new, and remember that if you possess general intelligence you can approach any subject. On the other hand, autogenous annihilation is one of the most plausible solutions to the Fermi Paradox. This is no time to get Kehoed

Introduction to ML

mlbrainmap

You can’t swing a cat through a newsfeed these days without hitting a story about the extent of learning algorithms in everyday life. As many readers are no doubt aware, there’s a type of computation that’s brought new perspective to a wide swath of common and previously intractable problems, and unlike a more deterministic programmatic approach, this new approach to is based on learning. That’s right, instead of a linear sequence of instructions these algorithms figure out what to do for themselves.

There’s probably not a single aspect of our modern world that isn’t affected by or directly based on the new field of learning-based methods. But this isn’t just any type of learning: Meat Learning, or ML for short, affects almost every essential experience in the modern world. Choosing what book to read next? Meat learning. What’s for lunch? ML. Driving cars? ML conquered that task years ago (despite a high-frequency of catastrophic failure). Whether you’ve been aware of it or not, and for better or for worse, we’ve truly entered the age of the learning algorithm.

Many of the basic ideas and underlying structures of modern ML have been around for millennia, if not epochs, but it’s only been in the last few million years that ML has really puffed up its breast-feathers. Almost all ML algorithms in common usage today run on a variant of the same hardware: a fatty grey mass known to the professionals that work with them as a “meat computer.”

Rather than an improvement in the basic design or clever new architectures, the enabling breakthrough for modern ML has been largely based on the sheer mass of computational meat (archaically known as grey matter) incorporated in bleeding edge modern platforms. This new type of learner, of which a wide swath of variants all share the same designation of “human,” hasn’t collectively been around for long (geologically speaking), but has already had a huge impact. It is nearly impossible to go anywhere on the planet without running into some sign of human meat-learners’ work, play, or self-destructive irony.

Previous generations of meat-learners had about half the mass of computational meat allocated for each kg of body mass compared to the current state-of-the-art human models, at least after engaging regularization factors to make the human variants feel better about themselves. This vast increase in computational mass in a meat computer is almost entirely comprised of warm, squishy subunits intended to mimic the activity of artificial neural networks. Like an iceberg, the majority of these networks are hidden from view, which makes them very mysterious (some would say nonsensical) to work with. More on that later on.

hiddenlayers

The official definition of a meat learning platform, according to ML pioneer Sammy Arthur is “…something that can learn, at least a little bit, but it has to be made of meat. The soft squishy stuff.” Some practitioners would say that the modern definition has grown since Arthur’s definitive statements. Although ML was at first considered a subset of the field of organic intelligence (OI for short), it is now widely acknowledged that ML has outgrown its humble beginnings as a specialization of OI, becoming a fully-fledged field in its own right. Also, it is quite clear that many of the modern systems running the cutting-edge human variants don’t possess all the attributes for qualification as bona fide general organic intelligence.

Unlike traditional programming, wherein the outcome depends on a succession of conditional statements, meat learners adapt to a given problem by being “trained” on large sets of data, aka, learning. To train a meat learning platform on solving a problem requires a large amount of input data, and in some cases, acquiring the right data can be quite the challenge. After learning on a sufficient number of training sets, a ML program becomes capable of independently completing a wide variety of tasks, including both stuff and things.

This unique learning approach means that meat learners often find solutions to problems that are unusual and sometimes downright surprising (or stupid). Due to the nature of how ML systems are trained, completing a task may include steps that appear to make very little sense until, finally, the program somehow converges to an interesting solution. Sometimes. Despite numerous advantages, these capabilities come at a cost.

Training an ML algorithm requires a vast input of training data, often taking twenty years or so before an ML system is ready for even the simplest of tasks. Due to the speed issue, most developers train an ML platform for multiple tasks in parallel, which seems to work out more or less OK. One potential disadvantage is that in most cases the vast majority of the squishy neural network performing ML computations is made up of hidden layers, the submerged part of the iceberg alluded to above. As a consequence, nobody is quite sure how they do what they do. In practice this isn’t much of a problem when a given ML is working, but it makes problems almost impossible to debug. Unlike a proper artificial neural network, not even the architecture of the system is available for design and adjustment. The only way to adjust a given ML system is by modifying the training inputs, often with unpredictable results. To make matters worse, in most cases learners heavily weight early training inputs, assigning little to no weight to those encountered as a more mature ML system. In all cases, once a particular ML circuit becomes entrenched in a wider ML architecture, it becomes subject to a “learned inertia” that makes it difficult for a learner to adapt previously learned strategies and can lead to typing in all caps.

Over-reliance on hidden layers and learned inertia aren’t the only problems associated with ML. Although meat-learners tend to be quite flexible, there are certain types of computation that ML just isn’t suited for. If your task involves Bayesian inference, for instance, you can just about forget about training an ML human platform to suit. Additionally, ML performance can be unacceptably erratic. A meat learner that previously performed reasonably well on a given task might perform terribly the next time around just because they are tired, drunk, or bored, and they usually are at least one of those three things.

ML has long been plagued by hardware problems as well. ML platforms tend to run at speeds of just a few tens of Hertz, and the upkeep of such a system can be expensive. Both dynamic and static memory suffer from volatility and infidelity, and these problems only get worse as the system matures. ML is inherently crufty: many aspects of the hardware have been retained from the very early days of the first chordate models, in turn borrowing much of their fundamentals from even earlier systems. The basis for switching action potentials, the discrete building block of meat learning activity, in microalgae is so similar to that in state-of-the-art meat learners that they can be functionally swapped. This highlights the archaic mechanics at the foundation of meat learning, but sheds some insight by making meat neurons (even in hidden layers) amenable to switching on or off with light exposure.

Despite myriad problems, ML seems to be here to stay and is liable to play a large role for at least another ten years or so. ML is slow, but gets around the problem by utilizing massive parallelization. ML takes years to train, but once it has learned a fair amount from training inputs it can perform a wide variety of tasks. ML is erratic and highly prone to various emotional and drive-reduction states, but most ML systems get around this issue by pretending everything is fine. Hidden layers make it difficult to determine the inner workings of a modern ML platform, but many agree that’s for the best anyway.

Whether ML developers overcome the existing issues or not may ultimately be irrelevant: for the nonce, meat learning in its many forms is the best and only irrevocable tool we’ve got.

Update 2016/12/23: Comment on action potentials added to paragraph 11