3 Ideas for Dealing with the Consequences of Overpopulation (from Science Fiction)

Photo by Rebekah Blocker on Unsplash

Despite overpopulation being a taboo topic these days, population pressure was a mainstream concern as recently as the latter half of the last century. Perhaps the earliest high-profile brand of population concern is Malthusianism: the results of a simple observation by Thomas Robert Malthus in 1798 that while unchecked population growth is exponential, the availability of resources (namely food) increases at a linear rate, leading to sporadic collapses in population due to war, famine, and pandemics (“Malthusian catastrophes”).

Equations like the Lotke-Volterra equations or the logistic map have been used to describe the chaotic growth and collapse of populations in nature, and for most of its existence Homo sapiens have been subject to similar natural checks on population size and accompanying booms and busts. Since shortly before the 1800s, however, it’s been nothing but up! up! up!, with the global population growing nearly eight-fold in little more than two centuries. Despite dire predictions of population collapse from classics like Paul Ehrlich’s The Population Bomb and the widespread consumption of algae and yeast by characters from the golden age of science fiction, the Green Revolution in agriculture largely allowed people to ignore the issue.

In recent decades the opposite of Matlhusianism, cornucopianism, has become increasingly popular. Cornucopians might point out that no one they know is starving right now, and believe that more people will naturally grow the carrying capacity for humans by being clever. This perspective is especially popular among people with substantial stock market holdings, as growing populations can buy more stuff. Many environmentalists decry the mention of a draw-down in human population as a way to affect environmental progress, pointing out the negative correlation in fertility and consumption disparities between richer and poorer nations. There are many other issues and accusations that typically pop up in any modern debate of human population and environmental concerns, but that’s not the topic of today’s post.

Regardless of where you fall on the spectrum from Malthusianism to cornucopianism, overpopulation vs. over-consumption, the fact remains: we don’t seem to know where to put all the poop.

In the spirit of optimism with a touch of cornucopianism and just in time for World Population Day 2020, here are three solutions for human population pressure from science fiction.

1. Explore The Possibilities of Soylent Green

Photo by Oleg Sergeichik on Unsplash

I guess it’s a spoiler that in the movie Soylent Green, the eponymous food product is, indeed, made of people. Sorry if no one told you before. The movie has gone down as classic, campy, dystopian sci-fi, but it actually doesn’t have much in common with the book Make Room, Make Room by Harry Harrison it is based on. Both book and movie are set in a miserable New York City overpopulated to the tune of some 35 to 40 million people in the far-off future of 1999. The movie revolves around a murderous cover-up to hide the cannibalistic protein source in “Soylent Green,” while the book examines food shortages, climate catastrophe, inequality, and the challenges of an aging population.

Despite how well it works in the movie, cannibalism is not actually a great response to population pressure. Due to infectious prions , it’s actually a terrible idea to source a large proportion of your diet from the flesh of your own, or closely related species And before you get clever: cooking meat containing infectious mis-folded prions does not make it safe.

Instead of focusing on cannibalism, I’ll mention a few of the far-out ideas for producing sufficient food mentioned in the book. These include atomic whales scooping up vast quantities of plankton from the oceans, presumably artificially fertilized; draining swamps and wetlands and converting them to agricultural land; and irrigating deserts with desalinated seawater.

These suggestions are probably not even drastic enough to belong on this list. Draining wetlands for farmland and living space has historically been a common practice (polder much?), but it is often discouraged in modern times due to the environmental damage it can cause, dangers of building on floodplains, and recognition of ecosystem services provided by wetlands (e.g. CWA 404). Seeding the oceans by fertilizing them with iron or sand dust is sometimes discussed as a means to sequester carbon or provide more food for aquatic life. Family planning services are also mentioned as a way to help families while attenuating environmental catastrophe, but, as art imitates life, nobody in the book takes it seriously.

2. Make People 10X Smaller

Photo by Cris Tagupa on Unsplash

If everyone was about 10 times shorter, they would weigh about 1000 times less and consume about that much fewer resources. The discrepancy in those numbers comes from the square-cube scaling law described by Galileo in 1638. To demonstrate with a simple example, a square has an area equal to the square of its side length, and a cube has a volume (and thus proportional weight) of the side length cubed. When applied to animal size this explains the increasing difficulty faced by larger animals to cool themselves and avoid collapsing under their own weight. So, if people were about 17 cm instead of about 170 cm they’d have a corresponding healthy body weight of about 0.63 kg instead of 63 kg (at a BMI of 21.75).

You can’t calculate the basal metabolic rate of a person that size using the Harris-Benedict equation without going into negative calories. If we follow the conclusion of (White and Seymour 2003) that mammalian basal metabolic rate scales proportional to body mass raised to 2/3, and assuming a normal basal metabolic rate of about 2000 kcal, miniaturization would decrease caloric needs by more than 20 times to about 92 calories a day. You could expect similar reductions in environmental footprints for transportation, housing, and waste outputs. Minsky estimated Earth’s carrying capacity could support about 100 billion humans if they were only a few inches tall, but this could be off by a factor of 10 in either direction. We should at least be able to assume the Earth could accommodate as many miniaturized humans as there are rats in the world currently, which is probably about as many as the ~16 billion humans at the upper end of UN estimates of world population by 2100.

Downsizing humans for environmental reasons was a major element in the 2017 film by the same name. But miniaturization comes with its own set of peculiarities to get used to. In Greg Egan’s 2002 novel Schild’s Ladder, Cass, one of the stories protagonists, is embodied in an avatar about 2 mm high after being transmitted to a deep-space research station with limited space. Cass experiences a number of differences in her everyday experience at her reduced size. She finds that she is essentially immune to damaging herself in collisions due to her decreased statue, and her vision is greatly altered due to the small apertures of her downsized eyes. The other scientists on the research station exist purely in software, taking up no room at all. But as long as people can live by computation on sophisticated computer hardware, why don’t we . . .

3. Upload Everyone

Photo by Florian Wehde on Unsplash

Greg Egan’s 1997 novel Diaspora has some of the most beautiful descriptions of existing in the universe as a thinking being ever committed to paper. That’s despite, or perhaps because of, the fact that most of the characters in the story exist as software incarnations running on communal hardware known as polises. Although simulated people (known as “citizens” in their polises) are the distant progeny of humans as we know them today, no particular weight is given to simulating their ancestral experience with any fidelity, making for a fun and diverse virtual world. Other valid lifestyle variations include physical embodiment as humanoid robots (called gleisners), and a wide variety of different modifications of biological humans. Without giving too much away, a group of biological humans are at some point given the offer of being uploaded in their entirety as software people. Whether bug or feature, the upload process is destructively facilitated by nanomachines collectively called Introdus. This seems like a great way to reduce existential risk while also reducing human environmental footprints. It’s a win-win!

Of course uploading predates 1997’s Diaspora by a long shot, and it’s practically a core staple of science-fiction besides. Uploading plays a prominent role in myriad works of science fiction including Greg Egan’s Permutation City from 1994, the Portal series of video games, the recent television/streaming series Upload, and many others. Perhaps the first story to prominently feature mind uploading is John Campbell’s The Infinite Brain published in Science Wonder Stories in 1930. The apparatus used to simulate a copy of the mind of the protagonist’s friend was a little different from our modern expectations of computers:

All of these were covered with a maze of little wheels and levers, slides and pulleys, all mounted on a series of long racks. At each end of the four tables a large electric motor, connected to a long shaft. A vast number of little belts rose up from this, and were connected with numberless cog wheels, which in their turn engaged others. There seemed to be some arrangement of little keys, resting on metal plates, and a sort of system of tiny slugs, like the matrices on a linotype; but everything was so mixed up with wires and coils and wheels that it was impossible to get any of the details.

I don’t know if any of the stories of mind uploading from fiction have environmental conservation as the main goal. There’s a lot of cool stuff you could do if you are computationally embodied in a simulated environment, and interstellar travel becomes a lot more tenable if you can basically transmit yourself (assuming receivers are available where you want to go) or push a few kgs of supercomputer around the galaxy with lasers and solar sails. Even if you choose the lifestyle mostly for fun, There should be substantial savings on your environmental footprint, eventually. Once we manage to match or exceed the power requirements of about 20 Watts for a meat-based human brain with a simulated mind, it should be reasonably easy to get that power from sustainable sources. Of course, current state-of-the-art models used in machine learning require substantially more energy to do substantially less than the human brain, so we’ll need to figure out a combination of shortcuts and fundamental breakthroughs in computing to make it work.

Timescales and Tenability

The various systems supporting life on Earth are complex enough to be essentially unpredictable at time scales relevant to human survival. We can make recently predictions about very long time scales: we can reasonably assert that several billion years from now when the sun will enter the next phases of its life cycle, making for a roasty situation a cold beverage is unlikely to rectify (it will boil away), and at short time scales: the sun is likely to come up tomorrow, mostly unchanged from what we see today. But any detailed estimate of the situation in a decade or two is likely to be wrong. Bridging those time scales with reasonable predictions takes deliberate, sustained effort, and we’re likely to need more of that to avoid existential threats.

Hopefully this list has given you ample food for thought to mull over as humans continue to multiply like so many bacteria. I’ll end with a fitting quote from Kurt Vonnegut’s Breakfast of Champions based on a story by fictional science fiction author Kilgore Trout:

“Kilgore Trout once wrote a short story which was a dialogue between two pieces of yeast. They were discussing the possible purposes of life as they ate sugar and suffocated in their own excrement.”

Rendering Deepmind’s Predicted Protein Structures for Novel Coronavirus 2019 (Stereo Anaglyphs Version)

I visualized predicted protein structures from the novel 2019 coronavirus. The structures are the latest from Deepmind’s AlphaFold, the champion entry in the CASP13 protein structure prediction competition that took place in 2018. They’ve reportedly continued to make improvements since then (perhaps hoping to make a similar showing at the next CASP spinning up this year), and there are open source implementations here and here (official), though I haven’t looked into the code yet. I’ve put together some notes on the putative functions of each protein described on the SWISS-MODEL site, which accompany the animated structure visualizations below. If you’d rather not look at blurry structures that look like a heavy case of chromatic aberration, you can go to the other version of this post. The animations on this page are best viewed with red/blue 3D glasses (or a sheet of red or blue acetate over each eye, if for some reason you have that instead).

I used PyMOL to render the predicted structures and build the animations in this post. PyMOL is open source software, and it’s pretty great. If you are a protein structure enthusiast, want to use PyMOL, and can afford to buy a license there is an incentive version that supports the maintenance of the open source project and ensures you always have the latest, greatest version of the program to work with.

Membrane protein (M protein).

This membrane protein is a part of the viral envelope. Via interactions with other proteins it plays an important role in forming the viral particle, but pre-existing template models of this protein are of low quality.

Non-structural protein 6 (Nsp6)

Nsp6 seems to play a role in inducing the host cell to make autophagosomes in order to deliver viral particles to lysosomes. Subverting the autophagsomal and lysosomal machinery of the host cell is a part of the maturation cycle for several different types of viruses. Low quality models of Nsp6 fragments were generated from homology modeling available on the SWISS-MODEL website

Non-structural protein 2 (Nsp2). One half of this homodimer uses a surface rendering, and the other is rendered in secondary structure cartoon motifs.

The function of Nsp2 isn’t fully determined yet, but it may have something to do with depressing host immune response and cell survival by interacting with Prohibitin 1 and 2 (PHB and PHB2). Prohibitin proteins have been implicated as receptors for chikungunya, and dengue fever virus particles.

Protein 3a

A little more is known about Protein 3a. The protein forms a tetrameric sodium channel that may be important for mediating the release of viral particles. Like the other proteins targeted by Deepmind’s AlphaFold team, this protein doesn’t have good sequence homologues and so had been limited to only a partial, low quality structure prediction.

Papain-like protease (PL-PRO), c-terminal section.

PL-PRO is a protease, which as the name suggests means it makes cuts in other protein. papain is a protease family named for the protease found in papaya. This one is responsible for processing viral RNA replicase by making a pair of cuts near the N-terminus of one of the peptide that make up the viral replicase. It also is associated with making membrane vesicles important for viral replication, along with Nsp4.

Non-structural protein 4 (Nsp4). (Surface mesh overlaid on top of the secondary structure cartoon representation).

Nsp4 plays a part, along with PL-PRO, in the production of vesicles required for viral replication. A pre-existing homology template based model of the C-terminus of Nsp4 bears a close resemblance to the AlphaFold prediction, at least superficially. A comparison of template-based model YP_009725300.1 model 1 and the AlphaFold prediction is shown below.

Comparison of AlphaFold prediction and template model prediction (in blue call-out box). The template model is considered to be reasonably good quality.

The predicted structures released by Deepmind come with a grain of salt which I’ll reiterate here. The structures are predicted (not experimental) so they may differ quite a bit from their native forms. Deepmind has made the structural estimates available under a CC BY 4.0 license (the citation is at the end of the post), and I’ll maintain the visualizations under the same license: feel free to use them with attribution.

There’s obviously a lot going on with the current coronavirus pandemic, so I won’t repeat the information about hand washing, social distancing, or hiding out in the woods that you’ve probably already read about. If you’re interested in learning more about protein structure prediction you can start with the Wikipedia entry and/or the introduction course on the SWISS-MODEL website. The Levinthal’s paradox is also a fun thought experiment for framing the problem and it’s inherent difficulty. Mohammed AlQuraishi wrote an insightful recap of AlphaFold at CASP13.

There is a tremendous amount of research effort currently dedicated to studying the 2019 novel coronavirus, including several structural modelling projects. If you don’t want to dive into the rabbit hole vortex of computational protein structure prediction but still want to do something combining protein structure and the COVID-19 virus, Folding@Home and Foldit both have projects related to the new coronavirus. You can help by donating some of your idle computer resources to simulate structural dynamics with Folding@Home or you can work at solving structural puzzles Foldit.

[1] John Jumper, Kathryn Tunyasuvunakool, Pushmeet Kohli, Demis Hassabis, and the AlphaFold Team, “Computational predictions of protein structures associated with COVID-19”, DeepMind website, 5 March 2020, https://deepmind.com/research/open-source/computational-predictions-of-protein-structures-associated-with-COVID-19

[2] SWISS-MODEL Coronavirus template structure predictions page https://swissmodel.expasy.org/repository/species/2697049

[3] PyMOL. Supported, incentive version. https://pymol.org/2/ Open source project: https://github.com/schrodinger/pymol-open-source

Rendering Deepmind’s Predicted Protein Structures for Novel Coronavirus 2019

I visualized predicted protein structures from the novel 2019 coronavirus. The structures are the latest from Deepmind’s AlphaFold, the champion entry in the CASP13 protein structure prediction competition that took place in 2018. They’ve reportedly continued to make improvements since then (perhaps hoping to make a similar showing at the next CASP spinning up this year), and there are open source implementations here and here (official), though I haven’t looked into the code yet. I’ve put together some notes on the putative functions of each protein described on the SWISS-MODEL site, which accompany the animated structure visualizations below. The gif files are each a few tens of Mb and so may take some time to load. If you’d prefer to look at the structures rendered as stereo anaglyphs (i.e. best-viewed with red/blue 3D glasses), click here.

I used PyMOL to render the predicted structures and build the animations in this post. PyMOL is open source software, and it’s pretty great. If you are a protein structure enthusiast, want to use PyMOL, and can afford to buy a license there is an incentive version that supports the maintenance of the open source project and ensures you always have the latest, greatest version of the program to work with.

Membrane protein (M protein).

This membrane protein is a part of the viral envelope. Via interactions with other proteins it plays an important role in forming the viral particle, but pre-existing template models of this protein are of low quality.

Non-structural protein 6 (Nsp6)

Nsp6 seems to play a role in inducing the host cell to make autophagosomes in order to deliver viral particles to lysosomes. Subverting the autophagsomal and lysosomal machinery of the host cell is a part of the maturation cycle for several different types of viruses. Low quality models of Nsp6 fragments were generated from homology modeling available on the SWISS-MODEL website

Non-structural protein 2 (Nsp2)

The function of Nsp2 isn’t fully determined yet, but it may have something to do with depressing host immune response and cell survival by interacting with Prohibitin 1 and 2 (PHB and PHB2). Prohibitin proteins have been implicated as receptors for chikungunya, and dengue fever virus particles.

Protein 3a

A little more is known about Protein 3a. The protein forms a tetrameric sodium channel that may be important for mediating the release of viral particles. Like the other proteins targeted by Deepmind’s AlphaFold team, this protein doesn’t have good sequence homologues and so had been limited to only a partial, low quality structure prediction.

Papain-like protease (PL-PRO), c-terminal section.

PL-PRO is a protease, which as the name suggests means it makes cuts in other protein. From the name, papain is a protease family named for the protease found in papaya. This one is responsible for processing viral RNA replicase by making a pair of cuts near the N-terminus of one of the peptides that make up the viral replicase. It also is associated with making membrane vesicles important for viral replication, along with Nsp4.

Non-structural protein 4 (Nsp4)

Nsp4 plays a part, along with PL-PRO, in the production of vesicles required for viral replication. A pre-existing homology template based model of the C-terminus of Nsp4 bears a close resemblance to the AlphaFold prediction, at least superficially. A comparison of template-based model YP_009725300.1 model 1 and the AlphaFold prediction is shown below.

Comparison of AlphaFold prediction and template model prediction (in blue call-out box). The template model is considered to be reasonably good quality.

The predicted structures released by Deepmind come with a grain of salt which I’ll reiterate here. The structures are predicted (not experimental) so they may differ quite a bit from their native forms. Deepmind has made the structural estimates available under a CC BY 4.0 license (the citation is at the end of the post), and I’ll maintain the visualizations under the same license: feel free to use them with attribution.

There’s obviously a lot going on with the current coronavirus pandemic, so I won’t repeat the information about hand washing, social distancing, or hiding out in the woods that you’ve probably already read about. If you’re interested in learning more about protein structure prediction you can start with the Wikipedia entry and/or the introduction course on the SWISS-MODEL website. The Levinthal’s paradox is also a fun thought experiment for framing the problem and it’s inherent difficulty. Mohammed AlQuraishi wrote an insightful recap of AlphaFold at CASP13.

There is a tremendous amount of research effort currently dedicated to studying the 2019 novel coronavirus, including several structural modelling projects. If you don’t want to dive into the rabbit hole vortex of computational protein structure prediction but still want to do something combining protein structure and the COVID-19 virus, Folding@Home and Foldit both have projects related to the new coronavirus. You can help by donating some of your idle computer resources to simulate structural dynamics with Folding@Home or you can work at solving structural puzzles Foldit.

[1] John Jumper, Kathryn Tunyasuvunakool, Pushmeet Kohli, Demis Hassabis, and the AlphaFold Team, “Computational predictions of protein structures associated with COVID-19”, DeepMind website, 5 March 2020, https://deepmind.com/research/open-source/computational-predictions-of-protein-structures-associated-with-COVID-19

[2] SWISS-MODEL Coronavirus template structure predictions page https://swissmodel.expasy.org/repository/species/2697049

[3] PyMOL. Supported, incentive version. https://pymol.org/2/ Open source project: https://github.com/schrodinger/pymol-open-source

Treating TensorFlow APIs Like a Genetics Experiment to Investigate MLP Performance Variations

I built two six-layer MLPs at different levels of abstraction: a lower-level MLP using explicit matrix multiplication and activation, and a higher-level MLP using tf.layers and tf.contrib.learn. Although my intention was simply to practice implementing simple MLPs at different levels of abstraction, and despite using the same optimizer and same architecture for training, the higher-level abstracted model performed much better (often achieving 100% accuracy on the validation datasets) than the model built around tf.matmul operations. That sort of mystery deserves an investigation, and I set out to find out what was leading to the performance difference and built two more models mixing tf.layers, tf.contrib.learn, and tf.matmul. I used the iris, wine, and digits datasets from scikit-learn as these are small enough to iterate over a lot of variations without taking too much time.

In genetics research it’s common practice to determine relationships between genes and traits by breaking things until the trait disappears, than trying to restore the trait by externally adding specific genes back to compensate for the broken one. These perturbations are called “knockout” and “rescue,” respectively, and I took a similar approach here. My main findings were:

  • Replacing tf.matmul operations with tf.layers didn’t have much effect. Changing dropout and other hyperparameters did not seem to effect the low-level and high-level models differently.
  • “Knocking out” the use of learn.Estimator.fit from tf.contrib.learnand running the training optimizer directly led to significantly degraded performance of the tf.layers model.
  • The model built around tf.matmul could be “rescued” by training with learn.Estimator.fitinstead of train_op.run.
  • The higher-level model using layers did generally perform a little better than the lower-level model, especially on the digits dataset.

So we can conclude that training with the tf.estimator API was likely responsible for the higher performance from the more abstracted model. Cross-validation curves demonstrating the training efficacy of the different models are shown below:

Cross-validation accuracy curves for different random seeds using the tf.layers model.

Cross-validation accuracy curves for different random seeds using the tf.matmul model.

These MLPs perform pretty well (and converge in just a few minutes) on the small sklearn datasets. The four models are built to be readily modifiable and iterable, and can be accessed from the Git repository

Decomposing Autoencoder Conv-net Outputs with Wavelets

Replacing a bespoke image segmentation workflow using classical computer vision tasks with a simple, fully convolutional neural network isn’t too hard with modern compute and software libraries, at least not for the first part of the learning curve. The conv-net alleviates your fine-tuning overhead, decreases the total curation requirement (time spent correcting human-obvious mistakes), and it even expands the flexibility of your segmentations so that you can simultaneously identify the pixel locations of multiple different classes. Even if the model occasionally makes mistakes, it seems to do so in a way that makes it obvious what the net was “thinking,” and the mistakes are still pretty close. If this is so easy, why do we still even have humans?

In some ways conv-nets work almost too well for many computer vision tasks. Getting a reasonably good result and declaring it “good enough” is very tempting. It’s easy to get lackadaisical about a task that you wouldn’t even approach for automation a decade ago, leaving it to undergraduates[1] to manually assess images for “research experience” like focused zipheads[2]. But we can do better, and it’s important that we do so if we are to live in a desirable future. Biased algorithms are nothing new, and the ramifications of a misbehaving model remain the responsibility of its creators[3]

Take a 4 layer CNN trained to segment mitochondria from electron micrographs of brain tissue (training on an electron microscopy dataset from EPFL here. On a scale from Loch Stenness to Loch Ness, the depth of this network is the Bonneville Salt Flats. Nonetheless this puddle of neurons manages to get a reasonably good result after only a few hundred epochs.

I don’t think it would take too much in the way of post-processing to clean those segmentation results: a closing operator to get rid of the erroneous spots and smooth a few artifacts. But isn’t that defeating the point? The ease of getting good results gained early can be a bit misleading. Getting to 90% or even 95% effectiveness on a task can seem pretty easy thanks to the impressive learning capacity of conv-nets, but closing the gap of the last few percent, building a model that generalizes to new datasets, or better yet, transfers what it has learned to largely different tasks is much more difficult. With all the accelerated hardware and improved software libraries we have available today you may be only 30 minutes away from a perfect cat classifier, but you’re at least a few months of diligent work away from a conv-net that can match the image analysis efficacy of an undergrad for a new project.

Pooling operations are often touted as a principal contributor to conv-net classifier invariance, but this is controversial, and in any case most people who can afford the hardware for memory-intensive models are leaving them behind. It seems that pooling is probably more important for regularization than for feature invariance, but we’ll leave that discussion for another time. One side effect of pooling operations is that images are blurred as the x/y dimensions are reduced in deeper layers.

U-Net architectures and atrous convolutions are two strategies that have lately been shown to be effective elements of image segmentation models. The assumed effect for both strategies is better retention of high frequency details (as compared to fully convolutional networks). These counteract some of the blurring effect that comes from using pooling layers.

In this post, we’ll compare the frequency content retained in the output from different models. The training data is EM data from brain slices like the example above. I’m using the dataset from the 2012 ISBI 2D EM segmentation challenge for training and validation (published by Cardona et al., and we’ll compare the results using the EPFL dataset mentioned above as a test set.

To examine how these elements contribute to a vision model, we’ll train them on EM data as autoencoders. I’ve built one model for each strategy, constrained to have the same number of weights. The training process looks something like this (in the case of the fully convolutional model):

Dilated convolutions are an old concept revitalized to address problems associated with details lost to pooling operations by making them optional. This is accomplished by using dilated convolutional kernels (spacing the weights with zeros, or holes) to achieve long-distance context without pooling. In the image below, the dark squares are the active weights while the light gray ones are the “holes” (i.e. in French atrous). Where these kernels are convolved with a layer, they act like a larger kernel without having to learn/store additional weights.

U-Net architectures, on the other hand, utilize skip connections to bring information from the early, less-pooled layers to later layers. The main risk I see in using U-Net architectures is that for a particularly deep model the network may develop an over-reliance on the skip connections. This would mean the very early layers will train faster and have a bigger influence on the model, losing out on the capacity for more abstract feature representations in the layers at the bottom of the “U”.

Using atrous convolutions makes for noticeably better autoencoding fidelity compared to a simple fully convolutional network:

While training with the UNet architecture produces images that are hardly discernible from the originals. Note that the images here are from the validation set, they aren’t seen by the model during training steps.

If you compare the results qualitatively, the U-Net architecture is a clear winner in terms of the sharpness of the decoded output. By the looks of it the U-Net is probably more susceptible to fitting noise as well, at least in this configuration. Using dilated convolutions also offers improved detail reconstruction compared to the fully convolutional network, but it does eat up more memory and trains more slowly due to the wide interior layers.

This seemed like a good opportunity to bring out wavelet analysis to quantify the differences in autoencoder output. We’ll use wavelet image decomposition to investigate which frequency levels are most prevalent in the decoded output from each model. Image decomposition with wavelets looks something like this:

The top-left image has been downsized 2x from the original by removing the details with a wavelet transform (using Daubechies 1). The details left over in the other quadrants correspond to the high frequency content oriented to the vertical, horizontal, and diagonal directions. By computing wavelet decompositions of the conv-net outputs and comparing the normalized sums at each level, we should be able to get a good idea of where the information of the image resides. You can get an impression of the first level of wavelet decomposition for output images from the various models in the examples below:

And finally, if we calculate the normalized power for each level of wavelet decomposition we can see where the majority of the information of the corresponding image resides. The metrics below are the average of 100 autoencoded images from the test dataset.

In the plot, spatial frequencies increase with decreasing levels from left to right. Level 8 refers to the 8th level of the wavelet decomposition, aka the average gray level in this case. The model using a U-Net architecture is the closest to recapitulating all the spatial frequencies of the original image, with the noticeable exception of an about 60% decrease in image intensity at the very highest spatial frequencies.

I’d say the difference between the U-Net output and the original image is mostly down to a denoising effect. The atrous conv-net is not too far behind the U-Net in terms of spatial frequency fidelity, and the choice of model variant probably would depend on the end use. For example, there are some very small sub-organellar dot features that are resolved in the U-Net reconstruction but not the atrous model. If we wanted to segment those features, we’d definitely choose the U-Net. On the other hand, the atrous net would probably suffer less from over-fitting if we wanted to train for segmenting the larger mitochondria and only have a small dataset to train on. Finally, if all we want is to coarsely identify the cellular boundaries, that’s basically what we see in the autoencoder output from the fully convolutional network.

Hopefully this has been a helpful exercise in examining conv-net capabilities in a simple example. Open questions for this set of models remain. Which model performs the best on an actual semantic segmentation task? Does the U-Net rely too much on the skip connections?

I’m working with these models in a repository where I plan to keep notes and code for experimenting with ideas from the machine learning literature and you’re welcome to use the models therein for your own experiments.

Datasets from:

A. Lucchi, K.Smith, R. Achanta, G. Knott, P. Fua, Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks with Learned Shape Features, IEEE Transactions on Medical Imaging, Vol. 30, Nr. 11, October 2011.

Albert Cardona, Stephan Saalfeld, Stephan Preibisch, Benjamin Schmid, Anchi Cheng, Jim Pulokas, Pavel Tomancak, Volker Hartenstein. An Integrated Micro- and Macroarchitectural Analysis of the Drosophila Brain by Computer-Assisted Serial Section Electron Microscopy. PLOS 2010

Zebra: https://commons.wikimedia.org/wiki/Zebra#/media/File:Three_Zebras_Drinking.jpg

Relevant articles:

Olaf Ronneberger, Philipp Fischer, Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. Arxiv. https://arxiv.org/abs/1505.04597

Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. Rethinking Atrous Convolution for Semantic Image Segmentation. Arxiv. https://arxiv.org/abs/1706.05587

[1] My first job in a research laboratory was to dig through soil samples with fine tweezers to remove roots. We don’t have robots to do this (yet) but I can’t imagine a bored undergraduate producing replicable results in this scenario, and the same goes for manual image segmenation or assessment. On the other hand the undergrad will probably give the best results, albeit with a high standard deviation, as they are likely to have the most ambiguous understanding of the professor’s hypothesis and desired results of anyone in the lab.

[2] I am indeed reading A Deepness in the Sky.

[3] (o_o) / (^_^) / (*~*)

All work and no play makes JAIck a dull boy

Fun with reservoir computing

By now you’ve probably read Andrej Karpathy’s blog post The Unreasonable Effectiveness of Recurrent Neural Networks, and if you haven’t, you definitely should. Andrej’s RNN examples are the inspiration for many of the RNNs doing* silly things that get picked up by the lay press. The impressive part is how these artificial texts land squarely in the uncanny valley solely by predicting the next character one by one. The results almost, but not quite, read like perfectly reasonable nonsense written by a human.

In fact we can approach similar tasks with a healthy injection of chaos. Reservoir computing has many variants, but the basic premise is always the same. Data is fed into a complex chaotic system called a reservoir, among other things, giving rise to long-lived dynamic states. By feeding the states of the system into a simple linear regression model, it’s possible to accomplish reasonably complicated tasks without training the dynamic reservoir at all. It’s like analyzing wingbeat patterns of Australian butterflies by observing hurricanes in Florida.

A computing reservoir can be made out of a wide variety of chaotic systems (e.g. a jointed pendulum) amenable to taking some sort of input but the ones most akin to neural networks consist of a big network of sparsely connected nodes with random weights and a non-linear activation function. In this example I’ll use a Schmitt trigger relaxation oscillator. If you have trained a few neural networks yourself, you can think of this as simply a tanh activation function with hysteresis. If you build/built BEAM robots, you can think of the activation function as one neuron in a 74HC14 based microcore you use to control walking robots.

By self-connecting a large vector of nodes to itself and to an input, it’s possible to get a complex response from very simple input. The response to a step function in this example is long lived, but it does die out eventually.

The activity in the reservoir looks a little like wave action in a liquid, as there tends to be a similar amount of energy in the system at any given time. With very little damping this can go on for a long time. The gif below demonstrates long-lived oscillations in a reservoir after a single unit impulse input. After a few thousand iterations of training, what we get is starting to look a little less like gibberish. What’s particularly interesting is that

But what about the funny text? Can reservoir computing make us laugh? Let’s start by testing whether reservoir computing can match the writing proficiency of famous fictional writer Jack Torrance. The animations below demonstrate the learning process: the dynamic reservoir carries a chaotic memory of past input characters, and a linear classifier predicts each next character as a probability. At first the combined system ouptputs nonsense, and we can see that the character predictions are very dynamic and random. Then the system gets very confident, but very stupid.

Later the system begins to learn words and the character probabilities are adapting to previous characters in a sensible matter-all without back-propagating into the reservoir at all.

After a while the system learns to reliably produce the phrase “All work and no play makes Jack a dull boy.”

If you want to try it yourself, this gist depends only on numpy. I made a normal GH repository for my code for generating figures and text here. There may be a Tensorflow version for training faster/on more complicated texts in the works (running it with numpy is no way to write a thesis).

*The torch-rnn repository used by Elle O’Brien for romance novel titles was written by Justin Johnson.

International Journal of Ronotics Vol. 1, Issue 1

[original image]

Three things that aren’t robots, rated relative to a household thermostat.

Many humans remain worried that robots will soon come to take their jobs, seize control of institutions, or simply decide that the time of humans has gone on far too long already. On the other hand, techno-optimists like myself look forward to engaging with wholly different architectures of minds unconstrained by biology, and as capable automation continues to erode the jobs that can be best done by humans, we can all look forward to careers as artisan baristas selling fancy coffees back and forth to each other. For the most part, robot experts and comic artists alike are of the mind that the robot apocalypse is still a ways off. But that doesn’t stop an equally insidious menace lurking at research labs across the globe: non-robot machines masquerading as robots.

I think it stands to reason that not everything in the universe can be a robot, so we should make some effort to accurately organize the things we encounter into categories ‘robots’ and ‘not-robots.’ Here I will use the term ‘ro-not’ for things that are called robots by their creators but are in fact some other thing in the not-robot category.

This may seem pedantic, but it is actually emblematic of general problems of misplaced incentives in scientific research. By choosing terms not on the basis of clarity and accuracy, but rather for how impressive they sound in a press release, we mislead and erode public confidence and literacy in science. This is a bad thing that we should try to dissuade.

So what is a robot? Although many machines are robot-like, it should be easy to assess how close to being a robot a thing is. Put simply, A robot must be a machine that is able to sense its environment and change its behavior accordingly. That’s it, the bar is not set unreasonably high. An industrial robot arm blindly following a pre-computed path doesn’t fit this definition, but add a sensor used to halt operation when a human enters the danger zone and it does.

Below I’ll rate 3 of these so-called robots on a scale of 1-10, with 1 being a “slinky”, 5 being a thermostat, and 10 being a fully sapient machine. These are all called ‘robots’ by their creators, and often published in robotics-specific journals, despite none of the machines below rising above the sentience of a thermostat. That’s not to say that the machines or their inventors are, uh, not good, or even that they shouldn’t publish their devices in robotics journals, but rather we should all learn to call a spade a spade and an actuator an actuator.

Vine robots.

Original article: “A soft robot that navigates its environment through growth.” Science Robotics  Vol. 2, Issue 8. 19 Jul 2017.

At first glance this machine looks like an inside-out balloon, but on closer inspection you’ll notice that it is in fact an inside out balloon. The video demonstrates using the appendenge to turn off a valve in a simulated “I left the poisonous gas valve on” situation (happens to us all), and with a camera attached the thing appears to be phototropic. Turns out this is misleading, however, and in fact the pattern of the plastic expandables is pre-programmed by adding tape or otherwise constraining the walls of the plastic before inflating.

Rating: 2.5/10. Amended publication title: “An inside-out plastic balloon that can follow preprogrammed inflation routes.”

Soft robot octopus.

Original article: “An integrated design and fabrication strategy for entirely soft, autonomous robots” Nature 536, 451–455 (25 August 2016)

This machine is an interesting pneumatic device molded out of PDMS to look like a cute octopus. It is fully capable of wiggling it’s arms and blowing bubbles, and the blue and red coloring of the hydrogen peroxide solution that powers it makes it look pretty cool. The “octobot” alternates raising half of its appendages at a time, and the coolest thing about it is that it does so with a microfludic circuit. In addition to powering the arms, each of the two channels feeding the arms also powers a valve that restricts fuel flow to the other channel. The designers deem this device the “first fully autonomous soft robot,” however, its alternating arm-raising seems to be a futile movement (it does not seem to move) and it also doesn’t appear to be able to respond to its environment in any significant way. The “microfluidic logic” oscillator is pretty cool: the authors claim that this makes the machine fully autonomous because it is untethered, but neither is a slinky and I don’t call that a robot either.

Rating: (4.0/10). Amended title: “An integrated design and fabrication strategy for entirely cute, colorful oscillators.”

Amphibious bee robot.

Original article: “A biologically inspired, flapping-wing, hybrid aerial-aquatic microrobot.” Science Robotics. Vol. 2, Issue 11. Oct 2017. [paywalled]

There are a number of misleading aspects of how the press office and journalists portrayed this machine. The first thing you’ll notice in the video demonstrations are the tethers above and below the thing: clearly it is not operating under its own power. While tethering may be a bit uninspiring, there’s nothing in our robot-rule that says you have to carry your own batteries and compute everywhere to be considered a robot, so the tether itself doesn’t rule out potential robotness. On the other hand, in an interview with Science Friday the lead author describes her role in the operation of the device, which is that she makes all the decisions and activates each mode of operation manually (the wings move at a different frequency for swimming and flight, for example). The device also can’t fly when it’s wet, which is a bit misleading and seemed to be the whole point of being an amphibious bee instead of being an air-only or water only winged device. One particularly cool thing about this device is that it uses a small explosion to break the grip of surface tension at the water’s surface, powered by hydrogen and oxygen gas generated in an internal chamber by electrolysis.

Rating (3/10). Amended title: “A university press office-inspired, flapping-wing, hybrid aerial-aquatic device that explodes a bit.”

I wouldn’t argue that any of the above are, uh, not good. In fact they may be quite cool as what they are and could potentially be put to good use as part of a robot. Disagree with my ratings or definition of a robot? Let me know in the comments or @theScinder. If you are involved in any of these projects and have expanded on the original device to meet the robot criteria above, let me know and I’ll add an update.

Introducing Ceph-O-Vision

I’ve been interested in cephalopod vision ever since I learned that, despite their superb appreciation for chroma (as evidenced by their ability to match the color of their surroundings as well as texture and pattern), cuttlefish eyes contain only one light-sensitive pigment. Unlike ourselves and other multichromatic animals that perceive color as a mix of activations of different-colored light receptors, cuttlefish must have another way. So while the images coming into the brain of a cuttlefish might look something like this . . .

dfs

. . . they manage to interpret the images to precisely match their surroundings and communicate colorful displays to other cuttlefish. Some time ago Stubbs and Stubbs put forth the possibility that they might use chromatic aberrations to interpret color (I discussed and simulated what that might look like in this post). What looks like random flickering in the gif above is actually simulated focusing across chromatic aberrations. [original video]. Contrary to what one might think, defocus and aberration in images isn’t “wrong.” On the contrary, if you know how to interpret them they provide a wealth of information that might allow a cuttlefish to see the world in all its chromatic glory.

Top: learned color image based on chromatic aberration stack. Middle: Neural network color reconstitution Bottom: Ground truth color image

We shouldn’t expect the cuttlefish to experience their world in fuzzy grayscale any more than we should expect humans to perceive their world in an animal version of a Bayer array, each photoreceptor individually distinguished (not to mention distracting saccades, blind spot at the optic nerve, vasculature shadowing, etc.). Instead, just like us humans, they would learn to perceive the visual data produced by their optical system in whatever way makes the most sense and is most useful.

I piped simulated cuttlefish vision images into a convolutional neural network with corresponding color images as reference. The cuttle-vision images flow through the 7 layer network and are compared to the RGB targets on the other side. I started by building a dataset of simulated images consisting of randomly placed pixel-sized colored dots. This was supposed to be the easy “toy example” I started with before moving on to real images.


Left: training input, middle: network’s attempt at reconstitution, right: target. For pixel sized color features, the convolutional kernels of the network learn to blur the target pixels into ring shapes.

Bizarrely, the network learned to interpret these images as colored donuts, usually centered around the correct location but incapable of reconstituting the original layout. Contrary to what you might expect, the simple dataset performed poorly even with many training examples and color image reconstitution improved dramatically when I switched to real images. Training on a selection of landscape images looks something like this:


Center: Ceph-O-Vision color perception. Bottom: Ground truth RGB. Top: Chromatic aberration training images (stacked as a color image for viewing)

As we saw in first example, reconstituting sparse single pixels from chromatic aberration images trains very poorly. However, the network was able to learn random patterns of larger features (offering better local context) much more effectively:

Interestingly enough, the network learns to be most sensitive to edges. You can see in the training gif above that after 1024 epochs of training, the network mainly reconstitutes pattern edges. It never learns to exactly replicate the RGB pattern, but gets pretty close. It would be interesting to use a network like this to predict what sort of optical illusions a cuttlefish might be susceptible too. This could provide a way to test the chromatic aberration hypothesis in cephalopod vision. Wikipedia Imageby Hans Hillewaert used as a mask for randomly generated color patterns.

Finally, I trained the network on some footage of a hunting cuttlefish CC BY SA John Turnbull. Training on the full video, here’s what a single frame looks like as the network updates over about a thousand training epochs:

This project is far from a finished piece, but it’s already vastly improved my intuition for how convolutional neural networks interpret images. It also provides an interesting starting point for thinking about how cuttlefish visually communicate and perceive. If you want more of the technical and unpolished details, you can follow this project’s Github repository. I have a lot of ideas on what to try next: naturally some control training with a round pupil (and thus less chromatic aberration), but also to compare the simple network I’ve built so far to the neuroanatomy of cephalopods and to implement a “smart camera” version for learning in real-time. If you found this project interesting, or have your own cool ideas mixing CNNs and animal vision, be sure to let me know @theScinder or in the comments.

What if they had put off the LIGO upgrades?

If a neutron star falls into a black hole but no one has upgraded the gravitational observatory to the required sensitivity, does it fail completely to change our view of the universe?

The Advanced Laser Interferometry Gravitational Observatory (aLIGO) consists of a pair of Fabry–Pérot Interferometers spaced about 3000 km apart, each sporting two cavities about 4 km long and sensitive to length changes smaller than a proton. The tubes containing the optics operate at a vacuum with about 10 times lower pressure than that experienced by the International Space Station in low earth orbit. The lasers put out in excess of 100 kW of laser power, and the power in the chambers is further amplified by each photon reflecting off of the test mass and back several hundred times. Each 20 kg test mass is balanced precariously on threads of glass thinner than things that are really rather thin already. In other words, it’s a huge friggin’ laser powerful enough to burn a burrito, with components precariously balanced in an inside out space ship.

On the 14th of September 2015, these instruments recorded measurements that would support the idea that spacetime changes size when masses accelerate. We usually refer to the instruments and all aspects of the research program supporting it by the same acronym: LIGO. Perhaps you’ve heard of it?

Although the colloquial story is that LIGO recorded the historic GW150914 gravitational wave event during an engineering run even before beginning formal scientific data collection, this isn’t strictly true. In fact LIGO had been performing science runs at Hanford and Livingston sites since 2002. In 2005, LIGO reached an original design sensitivity of strain detection on the order of one part in 1021. Another way to think about, and the common way to report, the sensitivity of the instruments is the distance at which a typical neutron-pair inspiral could nominally be detected. One part in 1021 strain sensitivity corresponds to a search distance of about 8 million parsecs (about 26 million light years). This was the sort of sensitivity LIGO was capable of up until the latter part of 2010. As impressive as that is, there were no gravitational wave detections during operation of LIGO from 2002 to 2010.

The now famous GW150914 and subsequent detections GW151226 and GW170104 came after a comprehensive suite of upgrades that boosted sensitivity to a search distance of 80 million parsecs (~262 light years) away. Four years of shutdown beginning in 2010 marked the transition from “intial LIGO” to “advanced LIGO” (aLIGO). Four years sounds like quite a while in human time, and an especially conservative experimenter might be wont to keep collecting data until proof-of-concept is established. As long as the machine is working in some rudimentary fashion, pushing to eke out just one detection before shutting down for risky upgrades might sound like it makes sense. What if LIGO had put off the upgrades to instead continue with scientific runs? Not much, as it turns out.

Our best guess for the frequency of observable events is based on what aLIGO picked up in the first science run. The first advanced run had about 1100 hours of uptime, time when both instruments were locked-in and active. During this run aLIGO’s picked up 2 confirmed events (and one almost event, yet unconfirmed), giving us a rate of 2 events per 1100 hours in a volume of 2.145 trillion parsecs cubed (the search volume for an 80 Mega-parsec detection distance). This leads us to expect 1 detection for every 22.92 days of run time, or about 16 detections per year, not considering instrument downtime.

Prioritizing data collection at the cost of forgoing upgrades, we would probably still be waiting on the big announcement. Operating at a pre-2014 sensitivity of 8 Mparsecs, we could expect a detection on average once ever 62 years. Assuming a Poisson distribution (events are random), the chances of one or more detections in 4 years of data collection, pre-aLIGO sensitivity, would be just a tick over 6%. For a 50/50 split in the odds of making a detection, we’d have to wait 44 years. Chances are, funding bodies could very well lose interest in that time, and we certainly would not have seen the international enthusiasm in gravitational wave research resulting from the GW150914 announcement.

The moral of the story? The difference between being “productive” and creating something great lies in the old “work smarter, not harder” paradigm. Blind diligence and the perseverance to keep on plugging away has little chance to push the boundaries of what is known to be possible.


Curious about any of the calculations discussed above? Tinker with my notes in this Jupyter notebook

I too built a rather decent deep learning rig for 900 quid

Skip to the components list
Skip to the benchmarks

Robert Heinlein’s 1957 Door into Summer returns throughout to a theme of knowing when it’s “time to railroad.” Loosely speaking this is the idea that one’s success comes as much from historical context as it does from innate ability, hard work, and luck (though much of the latter can be attributed to historical context).

Much of the concepts driving our modern AI renaissance are decades old at least- but the field lost steam as the computers were too slow and the Amazookles of the world were yet to use them to power their recommendation engines and so on. In the meantime computers have gotten much faster and much better at beating humans at very fancy games. Modern computers are now fast enough to make deep learning feasible, and it works for many problems as well as providing insight into how our own minds might work.

I too have seen the writing on the wall in recent years. I can say with some confidence that now is the time to railroad, and by “railroad” I mean revolutionise the world with artificial intelligence. A lot of things changed in big ways during the original “time to railroad,” the industrial revolution. For some this meant fortune and progress and for others, ruin. I would like to think that we are all a bit brighter than our old-timey counterparts were back then and we have the benefit of our history to learn from, so I’m rooting for an egalitarian utopia rather than an AI apocalypse. In any case, collective stewardship of the sea changes underway is important and this means the more people learn about AI the less likely the future will be decided solely by the technocratic elites of today.

I’ve completed a few MOOCs on machine learning in general and neural networks in particular, coded up some of the basic functions from scratch and I’m beginning to use some of the major libraries to investigate more interesting ideas. As I moved on from toy examples like MNIST and housing price prediction one thing became increasingly clear:

It took me a week of work to realize I was totally on the wrong track training a vision model meant to mimic cuttlefish perception on my laptop. This sort of wasted time really adds up, so I decided to go deeper and build my own GPU-enhanced deep learning rig.

Luckily there are lots of great guides out there as everyone and their grandmother is building their own DL rig at the moment. Most of the build guides have something along the lines of “. . . for xxxx monies” in the title, which makes it easier to match budgets. Build budgets run the gamut from the surprisingly capable $800 machine by Nick Condo to the serious $1700 machine by Slav Ivanov all the way up to the low low price of “under $5000” by Kunal Jain. I did not even read the last one because I am not made of money.

I am currently living in the UK, so that means I have to buy everything in pounds. The prices for components in pounds sterling are. . . pretty much the same as they are in greenbacks. The exchange rate to the British pound can be a bit misleading, even now that Brexit has crushed the pound sterling as well as our hopes and dreams. In my experience it seems like you can buy about the same for a pound at the store as for a dollar in the US or a euro on the continent. It seems like the only thing they use the exchange rate for is calculating salaries.

I’d recommend first visiting Tim Dettmers’ guide to choosing the right GPU for you. I’m in a stage of life where buying the “second cheapest” appropriate option is usually best. With a little additional background reading and following Tim’s guide, I selected the Nvidia GTX 1060 GPU with 6GB of memory. This was from Tim’s “I have little money” category, one up from the “I have almost no money” category, and in keeping with my life philosophy of the second-cheapest. Going to the next tier up is often close to twice as costly, but not close to twice as good. This holds for my choice of GPUs as well: a single 1070 is about twice the cost and around 50% or so faster than a 1060 However, two 1060s does get you pretty close to twice the performance, and that’s what I went with. As we’ll see in the benchmarks Tensorflow does make it pretty easy to take advantage of both GPUs, but doubling the capacity of my DLR by doubling the GPUs in the future won’t be plausible.

My upgradeability is somewhat limited by the number of threads (4) and PCIe lanes (16) of the modest i3 CPU I chose; if a near-term upgrade was a higher priority, I should have left out the second 1060 GPU and diverted that part of a budget to a better CPU (e.g. the Intel Xeon E5-1620 V4 recommended by Slav Ivanov). But if you’re shelling out so much for a higher-end system you’ll probably want a bigger GPU to start with, and it’s easy to see how one can go from a budget of $800 to $1700 rather quickly.

The rest of the computer’s job is to quickly dump data into the GPU memory without messing things up. I ended up using almost all the same components as those in Nick’s guide because, again, my physical makeup is meat rather than monetary in nature.

Here’s the full list of components. I sourced what I could from Amazon Warehouse Deals to try and keep the cost down.


GPU (x2): Gigabyte Nvidia GTX 1060 6GB (£205.78 each)
Motherboard: MSI Intel Z170 KRAIT-GAMING (£99.95)
CPU: Intel Core i3 6100 Skylake Dual-Core 3.7 GHz Processor (£94.58)
Memory: Corsair CMK16GX4M2A2400C14 Vengeance 2x8GB (1£05.78)
PSU: Corsair CP-9020078-UK Builder Series 750W CS750M ATX/EPS Semi-Modular 80 Plus Gold Power Supply Unit (£77.25)
Storage: SanDisk Ultra II SSD 240 GB SATA III (£72.18)
Case: Thermaltake Versa H23 (27.10)

Total: £888.40

I had never built a PC before and didn’t have any idea what I was doing. Luckily, Youtube did, and I didn’t even break anything when I slotted all the pieces together. I had an install thumb drive for Ubuntu 16.04 hanging around ready to go and consequently I was up and running relatively quickly.

The next step was installing the drivers and CUDA developer’s toolkit for the GPUs. I’ve been working mainly with Tensorflow lately, so I followed their guide to get everything ready to take advantage of the new setup. I am using Anaconda to manage Python environments for now, so I made one with tensorflow and another with tensorflow_gpu packages.

I decided to train on the CIFAR10 image classification dataset using this tutorial to test out the GPUs. I also wanted to see how fast training progresses on a project of mine, a two-category classifier for quantitative phase microscope images.

The CIFAR10 image classification tutorial from tensorflow.org was perfect because you can flag for the training to take place on one or two GPUs, or train on the CPU alone. It takes ~1.25 hours to train the first 10000 steps on the CPU, but only 4 minutes for the same training on one 1060. That’s a weeks-to-days/days-to-hours/hours-to-minutes level of speedup.

# CPU 10000 steps
2017-06-18 12:56:38.151978: step 0, loss = 4.68 (274.9 examples/sec; 0.466 sec/batch)
2017-06-18 12:56:42.815268: step 10, loss = 4.60 (274.5 examples/sec; 0.466 sec/batch)

2017-06-18 14:12:50.121319: step 9980, loss = 0.80 (283.0 examples/sec; 0.452 sec/batch)
2017-06-18 14:12:54.652866: step 9990, loss = 1.03 (282.5 examples/sec; 0.453 sec/batch)

# One GPU
2017-06-18 15:50:16.810051: step 0, loss = 4.67 (2.3 examples/sec; 56.496 sec/batch)
2017-06-18 15:50:17.678610: step 10, loss = 4.62 (6139.0 examples/sec; 0.021 sec/batch)
2017-06-18 15:50:17.886419: step 20, loss = 4.54 (6197.2 examples/sec; 0.021 sec/batch)

2017-06-18 15:54:00.386815: step 10000, loss = 0.96 (5823.0 examples/sec; 0.022 sec/batch)

# Two GPUs
2017-06-25 14:48:43.918359: step 0, loss = 4.68 (4.7 examples/sec; 27.362 sec/batch)
2017-06-25 14:48:45.058762: step 10, loss = 4.61 (10065.4 examples/sec; 0.013 sec/batch)

2017-06-25 14:52:28.510590: step 6000, loss = 0.91 (8172.5 examples/sec; 0.016 sec/batch)

2017-06-25 14:54:56.087587: step 9990, loss = 0.90 (6167.8 examples/sec; 0.021 sec/batch)

That’s about 21-32x speedup on the GPUs. Not quite double the speed on two GPUs because the model isn’t big enough to utilize all of both GPUs, as we can see in the output from nvidia-smi

# Training on one GPU

# Training on two GPUs

My own model had a similar speedup, going from training about one 79-image minibatch per second to training more than 30 per second. Trying to train this model on my laptop, a Microsoft Surface Book, I was getting about 0.75 steps a second. [Aside: the laptop does have a discrete GPU, a variant of the GeForce 940M, but no Linux driver that I’m aware of :/].

# Training on CPU only
INFO:tensorflow:global_step/sec: 0.981465
INFO:tensorflow:loss = 0.673449, step = 173 (101.889 sec)
INFO:tensorflow:global_step/sec: 0.994314
INFO:tensorflow:loss = 0.64968, step = 273 (100.572 sec)

# Dual GPUs
INFO:tensorflow:global_step/sec: 30.3432
INFO:tensorflow:loss = 0.317435, step = 90801 (3.296 sec)
INFO:tensorflow:global_step/sec: 30.6238
INFO:tensorflow:loss = 0.272398, step = 90901 (3.265 sec)
INFO:tensorflow:global_step/sec: 30.5632
INFO:tensorflow:loss = 0.327474, step = 91001 (3.272 sec)
INFO:tensorflow:global_step/sec: 30.5643
INFO:tensorflow:loss = 0.43074, step = 91101 (3.272 sec)
INFO:tensorflow:global_step/sec: 30.6085

Overall I’m pretty satisfied with the setup, and I’ve got a lot of cool projects to try out on it. Getting the basics for machine learning is pretty easy with all the great MOOCs and tutorials out there, but the learning curve slopes sharply upward after that. Working directly on real projects with a machine that can train big models before the heat-death of the universe is essential for gaining intuition and tackling cool problems.