Trolling a Neural Network to Learn About Color Cues

Neural networks are breaking into new fields and refining roles in old ones on a day-to-day basis. The main enabling breakthrough in recent years is the ability to efficiently train networks consisting of many stacked layers of artificial neurons. These deep learning networks have been used for everything from tomographic phase microscopy to learning to generate speech from scratch.

A particularly fun example of a deep neural net comes in the form of one @ColorizeBot, a twitter bot that generates color images from black and white photographs. For landscapes, portraits, and street photography the results are reasonably realistic, even if they do fall into an uncanny valley that is eery, striking, and often quite beautiful. I decided to try and trick @ColorizeBot to learn something about how it was trained and regularized, and maybe gain some insights into general color cues. First, a little background on how @ColorizeBot might be put together.

According to the description on @ColorizeBot’s Twitter page:

I hallucinate colors into any monochrome image. I consist of several ConvNets and have been trained on millions of images to recognize things and color them.

This tells us that CB is indeed an artificial neural network with many layers, some of which consist of convolutional layers. These would be sharing weights and give deep learning the ability to discover features from images rather than relying on a conventional machine vision approach of manual extraction of image features to train an algorithm. This gives CB the ability to discover important indicators of color that their handler wouldn’t necessarily have thought of in the first place. I expect CB was trained as a special type of autoencoder. Normally, an autoencoding neural network has the same data on both the input and output side and iteratively tries to reproduce the input at the output in an efficient manner. In this case instead of producing a single grayscale image at the output, the network would need to produce three versions, one image each for red, green, and blue color channels. Of course, it doesn’t make sense to totally throw away the structure of the black and white image and the way the authors include this a priori knowledge to inform the output must have been important for getting the technique to work well and fast. CB’s twitter bio claims to have trained on millions of photos, and I tried to trick it into making mistakes and revealing something about it’s inner workings and training data. To do this, I took some photos I thought might yield interesting results, converted them to grayscale, and sent them to @ColorizeBot.

The first thing I wanted to try is a classic teaching example from black and white photography. If you have ever thought about dusting off a vintage medium format rangefinder and turning your closet into a darkroom, you probably know that a vibrant sun-kissed tomato on a bed of crisp greens looks decidedly bland on black and white film. If one wishes to pursue the glamorous life of a hipster salad photograher, it’s important to invest in a few color filters to distinguish red and green. In general, red tomatoes and green salad leaves have approximately the same luminance (i.e. brightness) values. I wrote about how this example might look through the unique eyes of cephalapods, which can perceive color with only one color type of photoreceptor. Our own visual system can only see contrast between the two types of object by their color, but if a human viewer looks at a salad in a dark room (what? midnight is a perfectly reasonable time for salad), they can still tell what is and is not a tomato without distinguishing the colors. @ColorizeBot interprets a B&W photo of cherry tomatoes on spinach leaves as follows:


This scene is vaguely plausible. After all, it some people may prefer salads with unripe tomatoes. Perhaps meal-time photos from these people’s social media feeds made it into the training data for @ColorizeBot. What is potentially more interesting is that this test image revealed a spatial dependence- the tomatoes in the corner were correctly filled in with a reddish hue, while those in the center remain green. Maybe this has something to do with how salad images used to train the bot were framed. Alternatively, it could be that the abundance of leaves surrounding the central tomatoes provide a confusing context and CB is used to recognizing more isolated round objects as tomatoes. In any case it does know enough to guess than spinach is green and some cherry tomatoes are reddish.

Next I decided to try and deliberately evoke evidence of overfitting with an Ishihara test. These are the mosaic images of dots with colored numbers written in the pattern. If @ColorizeBot scraped public images from the internet for some of its training images, it probably came across Ishihara tests. If the colorizer expects to see some sort of numbers (or any patterned color variation) in a circle of dots that looks like a color-blindness test, it’s probably overfitting; the black and white image by design doesn’t give any clues about color variation.


That one’s a pass. The bot filled in the flyer with a bland brown coloration, but didn’t overfit by dreaming up color variation in the Ishihara test. This tells us that even though there’s a fair chance the neural net may have seen an imagef like this before, it doesn’t expect it every time it sees a flat pattern of circles. CB has also learned to hedge its bets when looking at a box of of colored pencils, which could conceivably be a box of brown sketching pencils.


What about a more typical type of photograph? Here’s an old truck in some snow:


CB managed to correctly interpret the high-albedo snow as white (except where it was confused by shadows), and, although it made the day out to be a bit sunnier than it actually was, most of the winter grass was correctly interpreted as brown. But have a look on the right hand side of the photo, where apparently CB decided the seasons changed to a green spring in the time it takes to scan your eyes across the image. This is the sort of surreal, uncanny effect that CB is capable of. It’s more pronounced, and sometimes much more aesthetic, on some of the fancier photos on CB’s Twitter feed. The seasonal transformation from one side of the photo tells us something about the limits of CB’s interpretation of context.

In a convolutional neural network, each part of an input image is convolved with kernels of a limited size, and the influence of one part of the image on its neighbors is limited to some degree by the size of the largest kernels. You can think of these convolutional kernels as smaller sub-images that are applied to the full image as a moving filter, and they are a foundational component of the ability of deep neural networks to discover features, like edges and orientations, without being explicitly told what to look for. The results of these convolutional layers propagate deeper through the network, where the algorithm can make increasingly complex connections between aspects of the image.

In the snowy truck and in the tomato/spinach salad examples, we were able to observe @ColorizeBot’s ability to change it’s interpretation of the same sort of objects across a single field of view. If you, fellow human, or myself see an image that looks like it was taken in winter, we include in our expectations “This photo looks like it was taken in winter, so it is likely the whole scene takes place in winter because that’s how photographs and time tends to work.” Likewise, we might find it strange for someone to have a preference for unripe tomatoes, but we’d find it even stranger for someone to prefer a mixture of ripe-ish and unripe tomatoes on the same salad. Maybe the salad maker was an impatient type suffering from a tomato shortage, but given a black and white photo that wouldn’t be my first guess on how it came to be based on the way most of the salads I’ve seen throughout my life have been constructed. In general we don’t see deep neural networks like @Colorizebot generalizing that far quite yet, and the resulting sense of context can be limited. This is different from generative networks like Google’s “Inception” or style transfer systems like, which perfuse an entire scene with a cohesive theme (even if that theme is “everything is made of duck’s eyes”).

Finally, what does CB think of theScinder’s logo image? It’s a miniature magnetoplasmadynamic thruster built out of a camera flash and magnet wire. Does CB have any prior experience with esoteric desktop plasma generators?


That’ll do CB, that’ll do.

Can’t get enough machine learning? Check out my other essays on the topic

@ColorizeBot’s Twitter feed

@CtheScinder’s Twitter feed

All the photographs used in this essay were taken by yours truly, (at, and all images were colorized by @ColorizeBot.

And finally, here’s the color-to-B&W-to-color transformation for the tomato spinach photo:



If you want to find out if a digital nematode is alive, try asking it.

Fancy living in a computer? Contributors to the OpenWorm project aim to make life inside a computer a (virtual) reality. In recent years, various brain projects have focused funding on moonshot science initiatives to map, model and ultimately understand the human brain: the computer that helps humans to cognito that they sum. These are similar in feel to the human genome project of the late 1990s and early 2000s. Despite the inherent contradictions of the oft-trotted trope that the human brain is the “most complex thing in the universe,” it is indeed quite a complicated machine, decidedly more complex than the human genome. Understanding how it works will take more than mapping every connection, which is akin to knowing every node in a circuit but having no idea what each component is. A multivalent approach at the levels of cells, circuits, connections, and mind offers the most complete picture. OpenWorm coordinator Stephen Larson et al. aim to start by understanding something a little bit simpler: the determinate 304 neuron brain and accompanying body of Caenorhabditis elegans, a soil-dwelling nematode worm that has served as a workhorse in biology for decades.

Genome, Brain

The connectome, a neural wiring diagram of the worm’s brain, has been mapped. The simulation of the worm at the cellular level is an ongoing open-source software program. The first human genome was sequenced only 3 years after the first C. elegans genome, a similar pace for full biological simulation in silico would mean that digital humans, or a reasonable facsimile, are possible within our lifetimes. At the point when these simulations of people are able to fool observers will these entities be alive and conscious? Have rights? Pay taxes? If a digital person claims the validity of their own consciousness should we take their word for it, or determine some metric for ascertaining the consciousness of a simulated person based on our own inspection? For answers to questions of existence and sapience we can turn to our own experience (believing as we do that we are conscious entities), and the venerable history of the questions as discussed in science fiction.

Conversation with the chatbot (a conversational precursor to intelligent software)CleverBot from 2014 December 24.

In the so-called golden age of science fiction characters tended to be smart, talented, and capable. Aside from an unnerving lack of faults and weakness, overall the protagonists were fundamentally human. The main difference between the audience and the actors in these stories was access to better technology. But it may be that this vision of a human future is comically (tragically?) myopic. Even our biology has been changing more quickly as civilisation and technologies develop. If we add a rate of technological advance that challenges the best-educated humans to keep pace, a speed-up of the rate of change in average meteorological variables, and human-driven selective pressure, the next century should be interesting to say the least. When those unobtainyl transferase pills for longevity finally kick in, generational turnover can no longer be counted on to ease adaptation to a step-change in civilisation.

Greg Egan (who may or may not be a computer program) has been writing about software-based people for over two decades. When the mind of a human is not limited to run on a single instance of its native hardware, new concepts such as “local death” and traveling by transmission emerge intrinsically. Most of the characters in novels from writers such as Egan waste little time questioning whether they will still exist if they have to resort to a backup copy of themselves. As in flesh-and-blood humans, persistence of memory plays a key role in the sense of self, but is not nearly so limited. If a software person splits themselves to pursue two avenues of interest, they may combine their experiences upon their reunion, rejoining as a single instance with a transiently bifurcated path. If the two instances of a single person disagree as to their sameness, they may decide to go on as two different people. These simulated people would be unlikely to care (beyond their inevitable battle for civil rights) whether you consider them to be alive and sapient or not, any more so than the reader is likely to disbelieve their own sapience.

Many of the thought experiments associated with software-based person-hood are prompted by a human perception of dubiousness in duplicity: two instances of a person existing at the same time, but not sharing a single experience, don’t feel like the same person. Perhaps as the OpenWorm project develops we can watch carefully for signs of animosity and existential crises among a population of digital C. elegans twinned from the same starting material. We (or our impostorous digital doppelgängers, depending on your perspective) may find out for ourselves what this feels like sooner than we think.

2014-12-29 – Leading comic edited for improved comedic effect