Trolling a Neural Network to Learn About Color Cues

Neural networks are breaking into new fields and refining roles in old ones on a day-to-day basis. The main enabling breakthrough in recent years is the ability to efficiently train networks consisting of many stacked layers of artificial neurons. These deep learning networks have been used for everything from tomographic phase microscopy to learning to generate speech from scratch.

A particularly fun example of a deep neural net comes in the form of one @ColorizeBot, a twitter bot that generates color images from black and white photographs. For landscapes, portraits, and street photography the results are reasonably realistic, even if they do fall into an uncanny valley that is eery, striking, and often quite beautiful. I decided to try and trick @ColorizeBot to learn something about how it was trained and regularized, and maybe gain some insights into general color cues. First, a little background on how @ColorizeBot might be put together.

According to the description on @ColorizeBot’s Twitter page:

I hallucinate colors into any monochrome image. I consist of several ConvNets and have been trained on millions of images to recognize things and color them.

This tells us that CB is indeed an artificial neural network with many layers, some of which consist of convolutional layers. These would be sharing weights and give deep learning the ability to discover features from images rather than relying on a conventional machine vision approach of manual extraction of image features to train an algorithm. This gives CB the ability to discover important indicators of color that their handler wouldn’t necessarily have thought of in the first place. I expect CB was trained as a special type of autoencoder. Normally, an autoencoding neural network has the same data on both the input and output side and iteratively tries to reproduce the input at the output in an efficient manner. In this case instead of producing a single grayscale image at the output, the network would need to produce three versions, one image each for red, green, and blue color channels. Of course, it doesn’t make sense to totally throw away the structure of the black and white image and the way the authors include this a priori knowledge to inform the output must have been important for getting the technique to work well and fast. CB’s twitter bio claims to have trained on millions of photos, and I tried to trick it into making mistakes and revealing something about it’s inner workings and training data. To do this, I took some photos I thought might yield interesting results, converted them to grayscale, and sent them to @ColorizeBot.

The first thing I wanted to try is a classic teaching example from black and white photography. If you have ever thought about dusting off a vintage medium format rangefinder and turning your closet into a darkroom, you probably know that a vibrant sun-kissed tomato on a bed of crisp greens looks decidedly bland on black and white film. If one wishes to pursue the glamorous life of a hipster salad photograher, it’s important to invest in a few color filters to distinguish red and green. In general, red tomatoes and green salad leaves have approximately the same luminance (i.e. brightness) values. I wrote about how this example might look through the unique eyes of cephalapods, which can perceive color with only one color type of photoreceptor. Our own visual system can only see contrast between the two types of object by their color, but if a human viewer looks at a salad in a dark room (what? midnight is a perfectly reasonable time for salad), they can still tell what is and is not a tomato without distinguishing the colors. @ColorizeBot interprets a B&W photo of cherry tomatoes on spinach leaves as follows:

This scene is vaguely plausible. After all, it some people may prefer salads with unripe tomatoes. Perhaps meal-time photos from these people’s social media feeds made it into the training data for @ColorizeBot. What is potentially more interesting is that this test image revealed a spatial dependence- the tomatoes in the corner were correctly filled in with a reddish hue, while those in the center remain green. Maybe this has something to do with how salad images used to train the bot were framed. Alternatively, it could be that the abundance of leaves surrounding the central tomatoes provide a confusing context and CB is used to recognizing more isolated round objects as tomatoes. In any case it does know enough to guess than spinach is green and some cherry tomatoes are reddish.

Next I decided to try and deliberately evoke evidence of overfitting with an Ishihara test. These are the mosaic images of dots with colored numbers written in the pattern. If @ColorizeBot scraped public images from the internet for some of its training images, it probably came across Ishihara tests. If the colorizer expects to see some sort of numbers (or any patterned color variation) in a circle of dots that looks like a color-blindness test, it’s probably overfitting; the black and white image by design doesn’t give any clues about color variation.

That one’s a pass. The bot filled in the flyer with a bland brown coloration, but didn’t overfit by dreaming up color variation in the Ishihara test. This tells us that even though there’s a fair chance the neural net may have seen an imagef like this before, it doesn’t expect it every time it sees a flat pattern of circles. CB has also learned to hedge its bets when looking at a box of of colored pencils, which could conceivably be a box of brown sketching pencils.

What about a more typical type of photograph? Here’s an old truck in some snow:

CB managed to correctly interpret the high-albedo snow as white (except where it was confused by shadows), and, although it made the day out to be a bit sunnier than it actually was, most of the winter grass was correctly interpreted as brown. But have a look on the right hand side of the photo, where apparently CB decided the seasons changed to a green spring in the time it takes to scan your eyes across the image. This is the sort of surreal, uncanny effect that CB is capable of. It’s more pronounced, and sometimes much more aesthetic, on some of the fancier photos on CB’s Twitter feed. The seasonal transformation from one side of the photo tells us something about the limits of CB’s interpretation of context.

In a convolutional neural network, each part of an input image is convolved with kernels of a limited size, and the influence of one part of the image on its neighbors is limited to some degree by the size of the largest kernels. You can think of these convolutional kernels as smaller sub-images that are applied to the full image as a moving filter, and they are a foundational component of the ability of deep neural networks to discover features, like edges and orientations, without being explicitly told what to look for. The results of these convolutional layers propagate deeper through the network, where the algorithm can make increasingly complex connections between aspects of the image.

In the snowy truck and in the tomato/spinach salad examples, we were able to observe @ColorizeBot’s ability to change it’s interpretation of the same sort of objects across a single field of view. If you, fellow human, or myself see an image that looks like it was taken in winter, we include in our expectations “This photo looks like it was taken in winter, so it is likely the whole scene takes place in winter because that’s how photographs and time tends to work.” Likewise, we might find it strange for someone to have a preference for unripe tomatoes, but we’d find it even stranger for someone to prefer a mixture of ripe-ish and unripe tomatoes on the same salad. Maybe the salad maker was an impatient type suffering from a tomato shortage, but given a black and white photo that wouldn’t be my first guess on how it came to be based on the way most of the salads I’ve seen throughout my life have been constructed. In general we don’t see deep neural networks like @Colorizebot generalizing that far quite yet, and the resulting sense of context can be limited. This is different from generative networks like Google’s “Inception” or style transfer systems like Deepart.io, which perfuse an entire scene with a cohesive theme (even if that theme is “everything is made of duck’s eyes”).

Finally, what does CB think of theScinder’s logo image? It’s a miniature magnetoplasmadynamic thruster built out of a camera flash and magnet wire. Does CB have any prior experience with esoteric desktop plasma generators?

That’ll do CB, that’ll do.

Can’t get enough machine learning? Check out my other essays on the topic

All the photographs used in this essay were taken by yours truly, (at http://www.thescinder.com), and all images were colorized by @ColorizeBot.

And finally, here’s the color-to-B&W-to-color transformation for the tomato spinach photo:

Lens caps that screw on

Most lenses already have a standard, secure method for attaching accessories to the distal end. So why do we still put up with the infamy of squeeze-style caps that are so easily lost? Below are some iterations of my designs for threaded lens caps, designed with information on the lens filter thread standards from Wikipedia, and printed in various colors of Shapeways basic sintered plastic. They’re durable, can be colorful, and it’s possible to emboss custom text or an image on the front. Oh, and I never worry about them falling off in the bag.

3D Printable Lens Hood Design

A lens hood is a shade that blocks out-of-frame light from reflecting off of the internals within the lens. This minimizes lens flares, so you can add them later in post. Just kidding.

Another form of lens flare is less obvious (and I don’t think J.J. Abrams uses it). It manifests as a haze across the majority of the frame making the image appear washed-out, and it never looks good. Unlike deliberate lens flares, it’s not obvious in the image itself where it comes from and doesn’t look dramatic.

To get the most effect from a lens hood, it needs to block out as much unwanted light as possible without actually showing up in the frame. This means that for any given lens at a certain focal length and field of view there will be a best angle for your lens hood.

The wikipedia article for angle of view gives an equation depending on the focal length and sensor size.

$2cdot tan^{-1}(frac{d}{2f})$

The variable $d$ is the dimension of interest. For a lens hood with a simple circular cross section throughout the longest dimension should be used, e.g. the diagonal length of a typical rectilinear sensor. The doubling factor can be omitted if you want to work the angle in relation to the optical axis, rather than the total angle.

The lens hood below is a general purpose lens hood (also 3D printed) for lenses with a 58mm filter thread diameter. It flares out a bit, and the angle is wide enough to use with a ~27mm focal length lens.

The images below show essentially the same 58mm diameter lens hood optimized for 16mm, 35mm, and 50mm, in order from left to right. The length of the hood in each case is 16mm. The shorter the focal length of the lens (and the larger the image sensor) the wider the angle, and the lens hood angle increases accordingly.

So far, I have printed the general purpose lens hood, which errs on the side of wide-angle caution. Once I have the additional test pieces in hand, we’ll give ’em the old Pepsi challenge.

Designing 3D printable Lieberkühn Reflectors for macro- and micro-photography

Designing a Lieberkühn Reflectors for macro- and micro-photography

A Lieberkühn Reflector gets its name from one Johann Nathaniel Lieberkühn, who invented the speculum that bears his name which you may recognize from reflective headband decorations for doctor costumes. The name is generally changed from “speculum” to “reflector” when referring to optical reflectors used in photography and microscopy, perhaps because the term has drifted from its original Latin root meaning “mirror” to refer to probing instruments for dilating orifices.

Lieberkühn reflectors were a way to bathe an opaque specimen in fill light. Lieberkühn reflectors and their use have unfortunately fallen by the wayside with the advent of modern conveniences like LEDs and fiber optic illumination. The above example from the collection of the Royal Microscopical Society displays a Lieberkühn on a simple microscope. In use, the reflector would be pointed towards the specimen, and fed light by a second mirror like the one on the rightmost microscope. Both of the microscopes pictured were on display at the Museum of the History of Science in Oxford

The working part of the Lieberkühn reflector is a parabolic mirror, which doesn’t add the spherical aberrations of hyper- or hypo-bolic configurations. As an added benefit, mirrors don’t tend to add chromatic dispersion or other aberrations associated with refraction (though they can effect polarisation). A parabola can be described as a a particular slice through a cone , but for the purposes of my first prototype, the functional description in cartesian coordinates will do.

$y = alpha x^2$
Where $alpha$ depends on the focal length of the parabola.
$alpha = 1 /4 f$

To get a functional, 3-dimensional mirror, I describe the parabola in terms of the focal length and a given radius as a 2D trace and spin it with rotate_extrude() in OpenSCAD. Leaving an aperture in the middle leaves room for light to reach the objective. The reflector shown below has a 4mm central aperture for the objective, 16mm focal length and 32mm diameter.

I have sent a few prototypes (matched to particular lenses or objectives) to Shapeways for prototyping. After some characterisation these will appear on theBilder shoppe.

Plenoptic imaging on a macro bellows rail

After watching Dr. Marc Levoy’s talk on light fields and the Stanford camera array I thought I’d investigate for myself the uses for synthetic aperture photography. From the talk:

By changing the amount of shift, you can change the depth of objects that are in focus. And here’s what that looks like. So we’re focusing synthetically on the bushes, across the street through my grad students into the building and there’s a step stool in the building.

This is about the same as matching up the left/right channel on a stereo anaglyph, but with more perspectives as inputs. To get a feel for things I drilled out a camera mount to attach to the end of a macro bellows rail and took a series of horizontally displaced images, shifting and combining them in Octave to focus at different depths.
Continue reading “Plenoptic imaging on a macro bellows rail”