Trolling a Neural Network to Learn About Color Cues

Neural networks are breaking into new fields and refining roles in old ones on a day-to-day basis. The main enabling breakthrough in recent years is the ability to efficiently train networks consisting of many stacked layers of artificial neurons. These deep learning networks have been used for everything from tomographic phase microscopy to learning to generate speech from scratch.

A particularly fun example of a deep neural net comes in the form of one @ColorizeBot, a twitter bot that generates color images from black and white photographs. For landscapes, portraits, and street photography the results are reasonably realistic, even if they do fall into an uncanny valley that is eery, striking, and often quite beautiful. I decided to try and trick @ColorizeBot to learn something about how it was trained and regularized, and maybe gain some insights into general color cues. First, a little background on how @ColorizeBot might be put together.

According to the description on @ColorizeBot’s Twitter page:

I hallucinate colors into any monochrome image. I consist of several ConvNets and have been trained on millions of images to recognize things and color them.

This tells us that CB is indeed an artificial neural network with many layers, some of which consist of convolutional layers. These would be sharing weights and give deep learning the ability to discover features from images rather than relying on a conventional machine vision approach of manual extraction of image features to train an algorithm. This gives CB the ability to discover important indicators of color that their handler wouldn’t necessarily have thought of in the first place. I expect CB was trained as a special type of autoencoder. Normally, an autoencoding neural network has the same data on both the input and output side and iteratively tries to reproduce the input at the output in an efficient manner. In this case instead of producing a single grayscale image at the output, the network would need to produce three versions, one image each for red, green, and blue color channels. Of course, it doesn’t make sense to totally throw away the structure of the black and white image and the way the authors include this a priori knowledge to inform the output must have been important for getting the technique to work well and fast. CB’s twitter bio claims to have trained on millions of photos, and I tried to trick it into making mistakes and revealing something about it’s inner workings and training data. To do this, I took some photos I thought might yield interesting results, converted them to grayscale, and sent them to @ColorizeBot.

The first thing I wanted to try is a classic teaching example from black and white photography. If you have ever thought about dusting off a vintage medium format rangefinder and turning your closet into a darkroom, you probably know that a vibrant sun-kissed tomato on a bed of crisp greens looks decidedly bland on black and white film. If one wishes to pursue the glamorous life of a hipster salad photograher, it’s important to invest in a few color filters to distinguish red and green. In general, red tomatoes and green salad leaves have approximately the same luminance (i.e. brightness) values. I wrote about how this example might look through the unique eyes of cephalapods, which can perceive color with only one color type of photoreceptor. Our own visual system can only see contrast between the two types of object by their color, but if a human viewer looks at a salad in a dark room (what? midnight is a perfectly reasonable time for salad), they can still tell what is and is not a tomato without distinguishing the colors. @ColorizeBot interprets a B&W photo of cherry tomatoes on spinach leaves as follows:

c2sel44vqaagemw-jpg-large

This scene is vaguely plausible. After all, it some people may prefer salads with unripe tomatoes. Perhaps meal-time photos from these people’s social media feeds made it into the training data for @ColorizeBot. What is potentially more interesting is that this test image revealed a spatial dependence- the tomatoes in the corner were correctly filled in with a reddish hue, while those in the center remain green. Maybe this has something to do with how salad images used to train the bot were framed. Alternatively, it could be that the abundance of leaves surrounding the central tomatoes provide a confusing context and CB is used to recognizing more isolated round objects as tomatoes. In any case it does know enough to guess than spinach is green and some cherry tomatoes are reddish.

Next I decided to try and deliberately evoke evidence of overfitting with an Ishihara test. These are the mosaic images of dots with colored numbers written in the pattern. If @ColorizeBot scraped public images from the internet for some of its training images, it probably came across Ishihara tests. If the colorizer expects to see some sort of numbers (or any patterned color variation) in a circle of dots that looks like a color-blindness test, it’s probably overfitting; the black and white image by design doesn’t give any clues about color variation.

c2se-teveae2_ay-jpg-large

That one’s a pass. The bot filled in the flyer with a bland brown coloration, but didn’t overfit by dreaming up color variation in the Ishihara test. This tells us that even though there’s a fair chance the neural net may have seen an imagef like this before, it doesn’t expect it every time it sees a flat pattern of circles. CB has also learned to hedge its bets when looking at a box of of colored pencils, which could conceivably be a box of brown sketching pencils.

c2seviwviaa87xo-jpg-large

What about a more typical type of photograph? Here’s an old truck in some snow:

c2scawfveaallw4-jpg-large

CB managed to correctly interpret the high-albedo snow as white (except where it was confused by shadows), and, although it made the day out to be a bit sunnier than it actually was, most of the winter grass was correctly interpreted as brown. But have a look on the right hand side of the photo, where apparently CB decided the seasons changed to a green spring in the time it takes to scan your eyes across the image. This is the sort of surreal, uncanny effect that CB is capable of. It’s more pronounced, and sometimes much more aesthetic, on some of the fancier photos on CB’s Twitter feed. The seasonal transformation from one side of the photo tells us something about the limits of CB’s interpretation of context.

In a convolutional neural network, each part of an input image is convolved with kernels of a limited size, and the influence of one part of the image on its neighbors is limited to some degree by the size of the largest kernels. You can think of these convolutional kernels as smaller sub-images that are applied to the full image as a moving filter, and they are a foundational component of the ability of deep neural networks to discover features, like edges and orientations, without being explicitly told what to look for. The results of these convolutional layers propagate deeper through the network, where the algorithm can make increasingly complex connections between aspects of the image.

In the snowy truck and in the tomato/spinach salad examples, we were able to observe @ColorizeBot’s ability to change it’s interpretation of the same sort of objects across a single field of view. If you, fellow human, or myself see an image that looks like it was taken in winter, we include in our expectations “This photo looks like it was taken in winter, so it is likely the whole scene takes place in winter because that’s how photographs and time tends to work.” Likewise, we might find it strange for someone to have a preference for unripe tomatoes, but we’d find it even stranger for someone to prefer a mixture of ripe-ish and unripe tomatoes on the same salad. Maybe the salad maker was an impatient type suffering from a tomato shortage, but given a black and white photo that wouldn’t be my first guess on how it came to be based on the way most of the salads I’ve seen throughout my life have been constructed. In general we don’t see deep neural networks like @Colorizebot generalizing that far quite yet, and the resulting sense of context can be limited. This is different from generative networks like Google’s “Inception” or style transfer systems like Deepart.io, which perfuse an entire scene with a cohesive theme (even if that theme is “everything is made of duck’s eyes”).

Finally, what does CB think of theScinder’s logo image? It’s a miniature magnetoplasmadynamic thruster built out of a camera flash and magnet wire. Does CB have any prior experience with esoteric desktop plasma generators?

c29xshxviaa2_g3

That’ll do CB, that’ll do.

Can’t get enough machine learning? Check out my other essays on the topic

@ColorizeBot’s Twitter feed

@CtheScinder’s Twitter feed

All the photographs used in this essay were taken by yours truly, (at http://www.thescinder.com), and all images were colorized by @ColorizeBot.

And finally, here’s the color-to-B&W-to-color transformation for the tomato spinach photo:

tomatotrickery

Teaching a Machine to Love  XOR

xorsketch

The XOR function outputs true if one of the two inputs are true

The exclusive or function, also known as XOR (but never going by both names simultaneously), has a special relationship to artificial intelligence in general, and neural networks in particular. This is thanks to a prominent book from 1969 by Marvin Minsky and Seymour Papert entitled “Perceptrons: an Introduction to Computational Geometry.” Depending on who you ask, this text was single-handedly responsible for the AI winter due to its critiques of the state of the art neural network of the time. In an alternative view, few people ever actually read the book but everyone heard about it, and the tendency was to generalize a special-case limitation of local and single-layer perceptrons to the point where interest and funding for neural networks evaporated. In any case, thanks to back-propagation, neural networks are now in widespread use and we can easily train a three-layer neural network to replicate the XOR function.

In words, the XOR function is true for two inputs if one of them, but not both, is true. When you plot the XOR as a graph, it becomes obvious why the early perceptron would have trouble getting it right more than half the time.

sketch2dxor

There’s not a way to draw a straight 2D line on the graph and separate the true and false outputs for XOR, red and green in the sketch above. Go ahead and try. The same is going to be true trying to use a plane to separate a 3D version and so on to higher dimensions.

sketch3dxor

That’s a problem because a single layer perceptron can only classify points linearly. But if we allow ourselves a curved boundary, we can separate the true and false outputs easily, which is exactly what we get by adding a hidden layer to a neural network.

xorwhiddenlayer

The truth-table for XOR is as follows:

Input Output
00 0
01 1
10 1
11 0

If we want to train a neural network to replicate the table above, we use backpropagation to flow the output errors backward through the network based on the neuron activations of each node. Based on the gradient of these activations and the error in the layer immediately above, the network error can be optimized by something like gradient descent. As a result, our network can now be taught to represent a non-linear function. For a network with two inputs, three hidden units, and one output the training might go something like this:

trainingxor

Update (2017/03/02) Here’s the gist for making the gif above:

Things to Think About From 2016

thescinder2016wordle

A word cloud of theScinder’s output for 2016, made with wordle.net

CRISPR/Cas9

This subject includes throwbacks to 2015, when I did most of my writing about CRISPR/Cas9. That’s not to say 2016 didn’t contain any major genetic engineering news. In particular scientists are continue to move ahead with the genetic modification of human embryos.

If you feel like I did before I engaged in some deeper background reading, you can catch up with my notes on the basics. I used the protein structures for existing gene-editing techniques to highlight the differences between the old-school gene editing techniques and editing with cas9. I also compared the effort it takes to modify a genome with cas9 to how difficult it was using zinc-finger nucleases, the previous state-of-the-art (spoiler: it amounts to days of difference).

TLDR: The advantage of genetic engineering with Cas9 over previous methods is the difference between writing out a sequence of letters and solving complex molecular binding problems.

aLIGO and the detection of gravitational waves

Among the most impressive scientific breakthroughs of the previous hundred years or so, a bunch of clever people with very sensitive machines announced they’ve detected the squidge-squodging of space. A lot of the LIGO data is available from the LIGO Open Science Center, and this is a great way to learn signal processing techniques in Python. I synchronized the sound of gravitational wave chirp GW150914 to a simulated visualization (from SXS) of a corresponding black hole inspiral and the result is the following video. You can read my notes about the process here. I also modified the chirp to play the first few notes of the “Super Mario Brothers” theme.

Machine Learning

I’ve just started an intensive study of the subject, but machine learning continues to dip its toes into everything to do with modern human life. We have a lot of experience with meat-based learning programs, which should give us some insight into how to avoid common pitfalls. The related renewed interest in artificial intelligence should make the next few years interesting. If we do end up with a “hard” general artificial intelligence sometime soon, it might make competition a bit tough, if you could call it competition at all.

Devote a few seconds of thought to the twin issues of privacy and data ownership.

Mars

2016 also marked a renewed interest in manned space exploration, largely because of the announcement from space enthusiast Elon Musk that he’s really stoked to send a few people to Mars. NASA is still interested in Mars as well, and might be a good partner to temper Musk’s enthusiasm. In the Q&A at about 1:21 in the video below, Musk seems to suggest a willingness to die as the primary prerequisite for his first batch of settlers. There’s some known unavoidable and unknown unknowable dangers in the venture, but de-prioritizing survivability as a mission constraint runs a better chance of delaying manned exploration as long as it remains as expensive as Musk optimistically expects.

Here’s some stuff that’s a little a lot less serious about living on Mars.

It doesn’t grab the headlines with such vigor, but Jeff Bezo’s Blue Origins had an impressive year: retiring their first rocket after five flights and exceeding the mission design in a final test of a launch escape system.
Blue Origin is also working on an orbital launch system called New Glenn, in honor of the first astronaut from the USA to orbit the earth.

In that case, where are we headed?

The previous year provided some exciting moments to really trip the synapses, but we had some worrying turns as well. The biggest challenges of the next few decades will all have technical components, and understanding them doesn’t come for free. Humanity is learning more about biology at more fundamental levels, and medicine won’t look the same in ten years. A lot of people seem unconcerned that we probably won’t make the 2 degrees Celsius threshold for limiting climate change, although not worrying about something doesn’t mean it won’t kill anyone. Scientists and engineers have been clever enough to develop machine learners to assist our curiosity, and it’s exciting to think that resurgent interest in AI might give us someone to talk to soon. Hopefully they’ll be better conversationalists than the currently available chatbots, and a second opinion on the nature of the universe could be useful. It’s not going to be easy to keep up with improving automation, and humans will have to think about what working means to them.

Take some time to really dig into these subjects. You probably already have some thoughts and opinions on some of them, so try to read a contrary take. If you can’t think of evidence that might change your mind, you don’t deserve your conclusions.

Remember that science, technological development, and innovation have a much larger long-term effect on humans and our place in the universe than the petty machinations of human fractionation. So keep learning, figure out something new, and remember that if you possess general intelligence you can approach any subject. On the other hand, autogenous annihilation is one of the most plausible solutions to the Fermi Paradox. This is no time to get Kehoed

Introduction to ML

mlbrainmap

You can’t swing a cat through a newsfeed these days without hitting a story about the extent of learning algorithms in everyday life. As many readers are no doubt aware, there’s a type of computation that’s brought new perspective to a wide swath of common and previously intractable problems, and unlike a more deterministic programmatic approach, this new approach to is based on learning. That’s right, instead of a linear sequence of instructions these algorithms figure out what to do for themselves.

There’s probably not a single aspect of our modern world that isn’t affected by or directly based on the new field of learning-based methods. But this isn’t just any type of learning: Meat Learning, or ML for short, affects almost every essential experience in the modern world. Choosing what book to read next? Meat learning. What’s for lunch? ML. Driving cars? ML conquered that task years ago (despite a high-frequency of catastrophic failure). Whether you’ve been aware of it or not, and for better or for worse, we’ve truly entered the age of the learning algorithm.

Many of the basic ideas and underlying structures of modern ML have been around for millennia, if not epochs, but it’s only been in the last few million years that ML has really puffed up its breast-feathers. Almost all ML algorithms in common usage today run on a variant of the same hardware: a fatty grey mass known to the professionals that work with them as a “meat computer.”

Rather than an improvement in the basic design or clever new architectures, the enabling breakthrough for modern ML has been largely based on the sheer mass of computational meat (archaically known as grey matter) incorporated in bleeding edge modern platforms. This new type of learner, of which a wide swath of variants all share the same designation of “human,” hasn’t collectively been around for long (geologically speaking), but has already had a huge impact. It is nearly impossible to go anywhere on the planet without running into some sign of human meat-learners’ work, play, or self-destructive irony.

Previous generations of meat-learners had about half the mass of computational meat allocated for each kg of body mass compared to the current state-of-the-art human models, at least after engaging regularization factors to make the human variants feel better about themselves. This vast increase in computational mass in a meat computer is almost entirely comprised of warm, squishy subunits intended to mimic the activity of artificial neural networks. Like an iceberg, the majority of these networks are hidden from view, which makes them very mysterious (some would say nonsensical) to work with. More on that later on.

hiddenlayers

The official definition of a meat learning platform, according to ML pioneer Sammy Arthur is “…something that can learn, at least a little bit, but it has to be made of meat. The soft squishy stuff.” Some practitioners would say that the modern definition has grown since Arthur’s definitive statements. Although ML was at first considered a subset of the field of organic intelligence (OI for short), it is now widely acknowledged that ML has outgrown its humble beginnings as a specialization of OI, becoming a fully-fledged field in its own right. Also, it is quite clear that many of the modern systems running the cutting-edge human variants don’t possess all the attributes for qualification as bona fide general organic intelligence.

Unlike traditional programming, wherein the outcome depends on a succession of conditional statements, meat learners adapt to a given problem by being “trained” on large sets of data, aka, learning. To train a meat learning platform on solving a problem requires a large amount of input data, and in some cases, acquiring the right data can be quite the challenge. After learning on a sufficient number of training sets, a ML program becomes capable of independently completing a wide variety of tasks, including both stuff and things.

This unique learning approach means that meat learners often find solutions to problems that are unusual and sometimes downright surprising (or stupid). Due to the nature of how ML systems are trained, completing a task may include steps that appear to make very little sense until, finally, the program somehow converges to an interesting solution. Sometimes. Despite numerous advantages, these capabilities come at a cost.

Training an ML algorithm requires a vast input of training data, often taking twenty years or so before an ML system is ready for even the simplest of tasks. Due to the speed issue, most developers train an ML platform for multiple tasks in parallel, which seems to work out more or less OK. One potential disadvantage is that in most cases the vast majority of the squishy neural network performing ML computations is made up of hidden layers, the submerged part of the iceberg alluded to above. As a consequence, nobody is quite sure how they do what they do. In practice this isn’t much of a problem when a given ML is working, but it makes problems almost impossible to debug. Unlike a proper artificial neural network, not even the architecture of the system is available for design and adjustment. The only way to adjust a given ML system is by modifying the training inputs, often with unpredictable results. To make matters worse, in most cases learners heavily weight early training inputs, assigning little to no weight to those encountered as a more mature ML system. In all cases, once a particular ML circuit becomes entrenched in a wider ML architecture, it becomes subject to a “learned inertia” that makes it difficult for a learner to adapt previously learned strategies and can lead to typing in all caps.

Over-reliance on hidden layers and learned inertia aren’t the only problems associated with ML. Although meat-learners tend to be quite flexible, there are certain types of computation that ML just isn’t suited for. If your task involves Bayesian inference, for instance, you can just about forget about training an ML human platform to suit. Additionally, ML performance can be unacceptably erratic. A meat learner that previously performed reasonably well on a given task might perform terribly the next time around just because they are tired, drunk, or bored, and they usually are at least one of those three things.

ML has long been plagued by hardware problems as well. ML platforms tend to run at speeds of just a few tens of Hertz, and the upkeep of such a system can be expensive. Both dynamic and static memory suffer from volatility and infidelity, and these problems only get worse as the system matures. ML is inherently crufty: many aspects of the hardware have been retained from the very early days of the first chordate models, in turn borrowing much of their fundamentals from even earlier systems. The basis for switching action potentials, the discrete building block of meat learning activity, in microalgae is so similar to that in state-of-the-art meat learners that they can be functionally swapped. This highlights the archaic mechanics at the foundation of meat learning, but sheds some insight by making meat neurons (even in hidden layers) amenable to switching on or off with light exposure.

Despite myriad problems, ML seems to be here to stay and is liable to play a large role for at least another ten years or so. ML is slow, but gets around the problem by utilizing massive parallelization. ML takes years to train, but once it has learned a fair amount from training inputs it can perform a wide variety of tasks. ML is erratic and highly prone to various emotional and drive-reduction states, but most ML systems get around this issue by pretending everything is fine. Hidden layers make it difficult to determine the inner workings of a modern ML platform, but many agree that’s for the best anyway.

Whether ML developers overcome the existing issues or not may ultimately be irrelevant: for the nonce, meat learning in its many forms is the best and only irrevocable tool we’ve got.

Update 2016/12/23: Comment on action potentials added to paragraph 11