Rendering Deepmind’s Predicted Protein Structures for Novel Coronavirus 2019 (Stereo Anaglyphs Version)

I visualized predicted protein structures from the novel 2019 coronavirus. The structures are the latest from Deepmind’s AlphaFold, the champion entry in the CASP13 protein structure prediction competition that took place in 2018. They’ve reportedly continued to make improvements since then (perhaps hoping to make a similar showing at the next CASP spinning up this year), and there are open source implementations here and here (official), though I haven’t looked into the code yet. I’ve put together some notes on the putative functions of each protein described on the SWISS-MODEL site, which accompany the animated structure visualizations below. If you’d rather not look at blurry structures that look like a heavy case of chromatic aberration, you can go to the other version of this post. The animations on this page are best viewed with red/blue 3D glasses (or a sheet of red or blue acetate over each eye, if for some reason you have that instead).

I used PyMOL to render the predicted structures and build the animations in this post. PyMOL is open source software, and it’s pretty great. If you are a protein structure enthusiast, want to use PyMOL, and can afford to buy a license there is an incentive version that supports the maintenance of the open source project and ensures you always have the latest, greatest version of the program to work with.

Membrane protein (M protein).

This membrane protein is a part of the viral envelope. Via interactions with other proteins it plays an important role in forming the viral particle, but pre-existing template models of this protein are of low quality.

Non-structural protein 6 (Nsp6)

Nsp6 seems to play a role in inducing the host cell to make autophagosomes in order to deliver viral particles to lysosomes. Subverting the autophagsomal and lysosomal machinery of the host cell is a part of the maturation cycle for several different types of viruses. Low quality models of Nsp6 fragments were generated from homology modeling available on the SWISS-MODEL website

Non-structural protein 2 (Nsp2). One half of this homodimer uses a surface rendering, and the other is rendered in secondary structure cartoon motifs.

The function of Nsp2 isn’t fully determined yet, but it may have something to do with depressing host immune response and cell survival by interacting with Prohibitin 1 and 2 (PHB and PHB2). Prohibitin proteins have been implicated as receptors for chikungunya, and dengue fever virus particles.

Protein 3a

A little more is known about Protein 3a. The protein forms a tetrameric sodium channel that may be important for mediating the release of viral particles. Like the other proteins targeted by Deepmind’s AlphaFold team, this protein doesn’t have good sequence homologues and so had been limited to only a partial, low quality structure prediction.

Papain-like protease (PL-PRO), c-terminal section.

PL-PRO is a protease, which as the name suggests means it makes cuts in other protein. papain is a protease family named for the protease found in papaya. This one is responsible for processing viral RNA replicase by making a pair of cuts near the N-terminus of one of the peptide that make up the viral replicase. It also is associated with making membrane vesicles important for viral replication, along with Nsp4.

Non-structural protein 4 (Nsp4). (Surface mesh overlaid on top of the secondary structure cartoon representation).

Nsp4 plays a part, along with PL-PRO, in the production of vesicles required for viral replication. A pre-existing homology template based model of the C-terminus of Nsp4 bears a close resemblance to the AlphaFold prediction, at least superficially. A comparison of template-based model YP_009725300.1 model 1 and the AlphaFold prediction is shown below.

Comparison of AlphaFold prediction and template model prediction (in blue call-out box). The template model is considered to be reasonably good quality.

The predicted structures released by Deepmind come with a grain of salt which I’ll reiterate here. The structures are predicted (not experimental) so they may differ quite a bit from their native forms. Deepmind has made the structural estimates available under a CC BY 4.0 license (the citation is at the end of the post), and I’ll maintain the visualizations under the same license: feel free to use them with attribution.

There’s obviously a lot going on with the current coronavirus pandemic, so I won’t repeat the information about hand washing, social distancing, or hiding out in the woods that you’ve probably already read about. If you’re interested in learning more about protein structure prediction you can start with the Wikipedia entry and/or the introduction course on the SWISS-MODEL website. The Levinthal’s paradox is also a fun thought experiment for framing the problem and it’s inherent difficulty. Mohammed AlQuraishi wrote an insightful recap of AlphaFold at CASP13.

There is a tremendous amount of research effort currently dedicated to studying the 2019 novel coronavirus, including several structural modelling projects. If you don’t want to dive into the rabbit hole vortex of computational protein structure prediction but still want to do something combining protein structure and the COVID-19 virus, Folding@Home and Foldit both have projects related to the new coronavirus. You can help by donating some of your idle computer resources to simulate structural dynamics with Folding@Home or you can work at solving structural puzzles Foldit.

[1] John Jumper, Kathryn Tunyasuvunakool, Pushmeet Kohli, Demis Hassabis, and the AlphaFold Team, “Computational predictions of protein structures associated with COVID-19”, DeepMind website, 5 March 2020, https://deepmind.com/research/open-source/computational-predictions-of-protein-structures-associated-with-COVID-19

[2] SWISS-MODEL Coronavirus template structure predictions page https://swissmodel.expasy.org/repository/species/2697049

[3] PyMOL. Supported, incentive version. https://pymol.org/2/ Open source project: https://github.com/schrodinger/pymol-open-source

Rendering Deepmind’s Predicted Protein Structures for Novel Coronavirus 2019

I visualized predicted protein structures from the novel 2019 coronavirus. The structures are the latest from Deepmind’s AlphaFold, the champion entry in the CASP13 protein structure prediction competition that took place in 2018. They’ve reportedly continued to make improvements since then (perhaps hoping to make a similar showing at the next CASP spinning up this year), and there are open source implementations here and here (official), though I haven’t looked into the code yet. I’ve put together some notes on the putative functions of each protein described on the SWISS-MODEL site, which accompany the animated structure visualizations below. The gif files are each a few tens of Mb and so may take some time to load. If you’d prefer to look at the structures rendered as stereo anaglyphs (i.e. best-viewed with red/blue 3D glasses), click here.

I used PyMOL to render the predicted structures and build the animations in this post. PyMOL is open source software, and it’s pretty great. If you are a protein structure enthusiast, want to use PyMOL, and can afford to buy a license there is an incentive version that supports the maintenance of the open source project and ensures you always have the latest, greatest version of the program to work with.

Membrane protein (M protein).

This membrane protein is a part of the viral envelope. Via interactions with other proteins it plays an important role in forming the viral particle, but pre-existing template models of this protein are of low quality.

Non-structural protein 6 (Nsp6)

Nsp6 seems to play a role in inducing the host cell to make autophagosomes in order to deliver viral particles to lysosomes. Subverting the autophagsomal and lysosomal machinery of the host cell is a part of the maturation cycle for several different types of viruses. Low quality models of Nsp6 fragments were generated from homology modeling available on the SWISS-MODEL website

Non-structural protein 2 (Nsp2)

The function of Nsp2 isn’t fully determined yet, but it may have something to do with depressing host immune response and cell survival by interacting with Prohibitin 1 and 2 (PHB and PHB2). Prohibitin proteins have been implicated as receptors for chikungunya, and dengue fever virus particles.

Protein 3a

A little more is known about Protein 3a. The protein forms a tetrameric sodium channel that may be important for mediating the release of viral particles. Like the other proteins targeted by Deepmind’s AlphaFold team, this protein doesn’t have good sequence homologues and so had been limited to only a partial, low quality structure prediction.

Papain-like protease (PL-PRO), c-terminal section.

PL-PRO is a protease, which as the name suggests means it makes cuts in other protein. From the name, papain is a protease family named for the protease found in papaya. This one is responsible for processing viral RNA replicase by making a pair of cuts near the N-terminus of one of the peptides that make up the viral replicase. It also is associated with making membrane vesicles important for viral replication, along with Nsp4.

Non-structural protein 4 (Nsp4)

Nsp4 plays a part, along with PL-PRO, in the production of vesicles required for viral replication. A pre-existing homology template based model of the C-terminus of Nsp4 bears a close resemblance to the AlphaFold prediction, at least superficially. A comparison of template-based model YP_009725300.1 model 1 and the AlphaFold prediction is shown below.

Comparison of AlphaFold prediction and template model prediction (in blue call-out box). The template model is considered to be reasonably good quality.

The predicted structures released by Deepmind come with a grain of salt which I’ll reiterate here. The structures are predicted (not experimental) so they may differ quite a bit from their native forms. Deepmind has made the structural estimates available under a CC BY 4.0 license (the citation is at the end of the post), and I’ll maintain the visualizations under the same license: feel free to use them with attribution.

There’s obviously a lot going on with the current coronavirus pandemic, so I won’t repeat the information about hand washing, social distancing, or hiding out in the woods that you’ve probably already read about. If you’re interested in learning more about protein structure prediction you can start with the Wikipedia entry and/or the introduction course on the SWISS-MODEL website. The Levinthal’s paradox is also a fun thought experiment for framing the problem and it’s inherent difficulty. Mohammed AlQuraishi wrote an insightful recap of AlphaFold at CASP13.

There is a tremendous amount of research effort currently dedicated to studying the 2019 novel coronavirus, including several structural modelling projects. If you don’t want to dive into the rabbit hole vortex of computational protein structure prediction but still want to do something combining protein structure and the COVID-19 virus, Folding@Home and Foldit both have projects related to the new coronavirus. You can help by donating some of your idle computer resources to simulate structural dynamics with Folding@Home or you can work at solving structural puzzles Foldit.

[1] John Jumper, Kathryn Tunyasuvunakool, Pushmeet Kohli, Demis Hassabis, and the AlphaFold Team, “Computational predictions of protein structures associated with COVID-19”, DeepMind website, 5 March 2020, https://deepmind.com/research/open-source/computational-predictions-of-protein-structures-associated-with-COVID-19

[2] SWISS-MODEL Coronavirus template structure predictions page https://swissmodel.expasy.org/repository/species/2697049

[3] PyMOL. Supported, incentive version. https://pymol.org/2/ Open source project: https://github.com/schrodinger/pymol-open-source

Treating TensorFlow APIs Like a Genetics Experiment to Investigate MLP Performance Variations

I built two six-layer MLPs at different levels of abstraction: a lower-level MLP using explicit matrix multiplication and activation, and a higher-level MLP using tf.layers and tf.contrib.learn. Although my intention was simply to practice implementing simple MLPs at different levels of abstraction, and despite using the same optimizer and same architecture for training, the higher-level abstracted model performed much better (often achieving 100% accuracy on the validation datasets) than the model built around tf.matmul operations. That sort of mystery deserves an investigation, and I set out to find out what was leading to the performance difference and built two more models mixing tf.layers, tf.contrib.learn, and tf.matmul. I used the iris, wine, and digits datasets from scikit-learn as these are small enough to iterate over a lot of variations without taking too much time.

In genetics research it’s common practice to determine relationships between genes and traits by breaking things until the trait disappears, than trying to restore the trait by externally adding specific genes back to compensate for the broken one. These perturbations are called “knockout” and “rescue,” respectively, and I took a similar approach here. My main findings were:

  • Replacing tf.matmul operations with tf.layers didn’t have much effect. Changing dropout and other hyperparameters did not seem to effect the low-level and high-level models differently.
  • “Knocking out” the use of learn.Estimator.fit from tf.contrib.learnand running the training optimizer directly led to significantly degraded performance of the tf.layers model.
  • The model built around tf.matmul could be “rescued” by training with learn.Estimator.fitinstead of train_op.run.
  • The higher-level model using layers did generally perform a little better than the lower-level model, especially on the digits dataset.

So we can conclude that training with the tf.estimator API was likely responsible for the higher performance from the more abstracted model. Cross-validation curves demonstrating the training efficacy of the different models are shown below:

Cross-validation accuracy curves for different random seeds using the tf.layers model.

Cross-validation accuracy curves for different random seeds using the tf.matmul model.

These MLPs perform pretty well (and converge in just a few minutes) on the small sklearn datasets. The four models are built to be readily modifiable and iterable, and can be accessed from the Git repository

Decomposing Autoencoder Conv-net Outputs with Wavelets

Replacing a bespoke image segmentation workflow using classical computer vision tasks with a simple, fully convolutional neural network isn’t too hard with modern compute and software libraries, at least not for the first part of the learning curve. The conv-net alleviates your fine-tuning overhead, decreases the total curation requirement (time spent correcting human-obvious mistakes), and it even expands the flexibility of your segmentations so that you can simultaneously identify the pixel locations of multiple different classes. Even if the model occasionally makes mistakes, it seems to do so in a way that makes it obvious what the net was “thinking,” and the mistakes are still pretty close. If this is so easy, why do we still even have humans?

In some ways conv-nets work almost too well for many computer vision tasks. Getting a reasonably good result and declaring it “good enough” is very tempting. It’s easy to get lackadaisical about a task that you wouldn’t even approach for automation a decade ago, leaving it to undergraduates[1] to manually assess images for “research experience” like focused zipheads[2]. But we can do better, and it’s important that we do so if we are to live in a desirable future. Biased algorithms are nothing new, and the ramifications of a misbehaving model remain the responsibility of its creators[3]

Take a 4 layer CNN trained to segment mitochondria from electron micrographs of brain tissue (training on an electron microscopy dataset from EPFL here. On a scale from Loch Stenness to Loch Ness, the depth of this network is the Bonneville Salt Flats. Nonetheless this puddle of neurons manages to get a reasonably good result after only a few hundred epochs.

I don’t think it would take too much in the way of post-processing to clean those segmentation results: a closing operator to get rid of the erroneous spots and smooth a few artifacts. But isn’t that defeating the point? The ease of getting good results gained early can be a bit misleading. Getting to 90% or even 95% effectiveness on a task can seem pretty easy thanks to the impressive learning capacity of conv-nets, but closing the gap of the last few percent, building a model that generalizes to new datasets, or better yet, transfers what it has learned to largely different tasks is much more difficult. With all the accelerated hardware and improved software libraries we have available today you may be only 30 minutes away from a perfect cat classifier, but you’re at least a few months of diligent work away from a conv-net that can match the image analysis efficacy of an undergrad for a new project.

Pooling operations are often touted as a principal contributor to conv-net classifier invariance, but this is controversial, and in any case most people who can afford the hardware for memory-intensive models are leaving them behind. It seems that pooling is probably more important for regularization than for feature invariance, but we’ll leave that discussion for another time. One side effect of pooling operations is that images are blurred as the x/y dimensions are reduced in deeper layers.

U-Net architectures and atrous convolutions are two strategies that have lately been shown to be effective elements of image segmentation models. The assumed effect for both strategies is better retention of high frequency details (as compared to fully convolutional networks). These counteract some of the blurring effect that comes from using pooling layers.

In this post, we’ll compare the frequency content retained in the output from different models. The training data is EM data from brain slices like the example above. I’m using the dataset from the 2012 ISBI 2D EM segmentation challenge for training and validation (published by Cardona et al., and we’ll compare the results using the EPFL dataset mentioned above as a test set.

To examine how these elements contribute to a vision model, we’ll train them on EM data as autoencoders. I’ve built one model for each strategy, constrained to have the same number of weights. The training process looks something like this (in the case of the fully convolutional model):

Dilated convolutions are an old concept revitalized to address problems associated with details lost to pooling operations by making them optional. This is accomplished by using dilated convolutional kernels (spacing the weights with zeros, or holes) to achieve long-distance context without pooling. In the image below, the dark squares are the active weights while the light gray ones are the “holes” (i.e. in French atrous). Where these kernels are convolved with a layer, they act like a larger kernel without having to learn/store additional weights.

U-Net architectures, on the other hand, utilize skip connections to bring information from the early, less-pooled layers to later layers. The main risk I see in using U-Net architectures is that for a particularly deep model the network may develop an over-reliance on the skip connections. This would mean the very early layers will train faster and have a bigger influence on the model, losing out on the capacity for more abstract feature representations in the layers at the bottom of the “U”.

Using atrous convolutions makes for noticeably better autoencoding fidelity compared to a simple fully convolutional network:

While training with the UNet architecture produces images that are hardly discernible from the originals. Note that the images here are from the validation set, they aren’t seen by the model during training steps.

If you compare the results qualitatively, the U-Net architecture is a clear winner in terms of the sharpness of the decoded output. By the looks of it the U-Net is probably more susceptible to fitting noise as well, at least in this configuration. Using dilated convolutions also offers improved detail reconstruction compared to the fully convolutional network, but it does eat up more memory and trains more slowly due to the wide interior layers.

This seemed like a good opportunity to bring out wavelet analysis to quantify the differences in autoencoder output. We’ll use wavelet image decomposition to investigate which frequency levels are most prevalent in the decoded output from each model. Image decomposition with wavelets looks something like this:

The top-left image has been downsized 2x from the original by removing the details with a wavelet transform (using Daubechies 1). The details left over in the other quadrants correspond to the high frequency content oriented to the vertical, horizontal, and diagonal directions. By computing wavelet decompositions of the conv-net outputs and comparing the normalized sums at each level, we should be able to get a good idea of where the information of the image resides. You can get an impression of the first level of wavelet decomposition for output images from the various models in the examples below:

And finally, if we calculate the normalized power for each level of wavelet decomposition we can see where the majority of the information of the corresponding image resides. The metrics below are the average of 100 autoencoded images from the test dataset.

In the plot, spatial frequencies increase with decreasing levels from left to right. Level 8 refers to the 8th level of the wavelet decomposition, aka the average gray level in this case. The model using a U-Net architecture is the closest to recapitulating all the spatial frequencies of the original image, with the noticeable exception of an about 60% decrease in image intensity at the very highest spatial frequencies.

I’d say the difference between the U-Net output and the original image is mostly down to a denoising effect. The atrous conv-net is not too far behind the U-Net in terms of spatial frequency fidelity, and the choice of model variant probably would depend on the end use. For example, there are some very small sub-organellar dot features that are resolved in the U-Net reconstruction but not the atrous model. If we wanted to segment those features, we’d definitely choose the U-Net. On the other hand, the atrous net would probably suffer less from over-fitting if we wanted to train for segmenting the larger mitochondria and only have a small dataset to train on. Finally, if all we want is to coarsely identify the cellular boundaries, that’s basically what we see in the autoencoder output from the fully convolutional network.

Hopefully this has been a helpful exercise in examining conv-net capabilities in a simple example. Open questions for this set of models remain. Which model performs the best on an actual semantic segmentation task? Does the U-Net rely too much on the skip connections?

I’m working with these models in a repository where I plan to keep notes and code for experimenting with ideas from the machine learning literature and you’re welcome to use the models therein for your own experiments.

Datasets from:

A. Lucchi, K.Smith, R. Achanta, G. Knott, P. Fua, Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks with Learned Shape Features, IEEE Transactions on Medical Imaging, Vol. 30, Nr. 11, October 2011.

Albert Cardona, Stephan Saalfeld, Stephan Preibisch, Benjamin Schmid, Anchi Cheng, Jim Pulokas, Pavel Tomancak, Volker Hartenstein. An Integrated Micro- and Macroarchitectural Analysis of the Drosophila Brain by Computer-Assisted Serial Section Electron Microscopy. PLOS 2010

Zebra: https://commons.wikimedia.org/wiki/Zebra#/media/File:Three_Zebras_Drinking.jpg

Relevant articles:

Olaf Ronneberger, Philipp Fischer, Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. Arxiv. https://arxiv.org/abs/1505.04597

Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. Rethinking Atrous Convolution for Semantic Image Segmentation. Arxiv. https://arxiv.org/abs/1706.05587

[1] My first job in a research laboratory was to dig through soil samples with fine tweezers to remove roots. We don’t have robots to do this (yet) but I can’t imagine a bored undergraduate producing replicable results in this scenario, and the same goes for manual image segmenation or assessment. On the other hand the undergrad will probably give the best results, albeit with a high standard deviation, as they are likely to have the most ambiguous understanding of the professor’s hypothesis and desired results of anyone in the lab.

[2] I am indeed reading A Deepness in the Sky.

[3] (o_o) / (^_^) / (*~*)

All work and no play makes JAIck a dull boy

Fun with reservoir computing

By now you’ve probably read Andrej Karpathy’s blog post The Unreasonable Effectiveness of Recurrent Neural Networks, and if you haven’t, you definitely should. Andrej’s RNN examples are the inspiration for many of the RNNs doing* silly things that get picked up by the lay press. The impressive part is how these artificial texts land squarely in the uncanny valley solely by predicting the next character one by one. The results almost, but not quite, read like perfectly reasonable nonsense written by a human.

In fact we can approach similar tasks with a healthy injection of chaos. Reservoir computing has many variants, but the basic premise is always the same. Data is fed into a complex chaotic system called a reservoir, among other things, giving rise to long-lived dynamic states. By feeding the states of the system into a simple linear regression model, it’s possible to accomplish reasonably complicated tasks without training the dynamic reservoir at all. It’s like analyzing wingbeat patterns of Australian butterflies by observing hurricanes in Florida.

A computing reservoir can be made out of a wide variety of chaotic systems (e.g. a jointed pendulum) amenable to taking some sort of input but the ones most akin to neural networks consist of a big network of sparsely connected nodes with random weights and a non-linear activation function. In this example I’ll use a Schmitt trigger relaxation oscillator. If you have trained a few neural networks yourself, you can think of this as simply a tanh activation function with hysteresis. If you build/built BEAM robots, you can think of the activation function as one neuron in a 74HC14 based microcore you use to control walking robots.

By self-connecting a large vector of nodes to itself and to an input, it’s possible to get a complex response from very simple input. The response to a step function in this example is long lived, but it does die out eventually.

The activity in the reservoir looks a little like wave action in a liquid, as there tends to be a similar amount of energy in the system at any given time. With very little damping this can go on for a long time. The gif below demonstrates long-lived oscillations in a reservoir after a single unit impulse input. After a few thousand iterations of training, what we get is starting to look a little less like gibberish. What’s particularly interesting is that

But what about the funny text? Can reservoir computing make us laugh? Let’s start by testing whether reservoir computing can match the writing proficiency of famous fictional writer Jack Torrance. The animations below demonstrate the learning process: the dynamic reservoir carries a chaotic memory of past input characters, and a linear classifier predicts each next character as a probability. At first the combined system ouptputs nonsense, and we can see that the character predictions are very dynamic and random. Then the system gets very confident, but very stupid.

Later the system begins to learn words and the character probabilities are adapting to previous characters in a sensible matter-all without back-propagating into the reservoir at all.

After a while the system learns to reliably produce the phrase “All work and no play makes Jack a dull boy.”

If you want to try it yourself, this gist depends only on numpy. I made a normal GH repository for my code for generating figures and text here. There may be a Tensorflow version for training faster/on more complicated texts in the works (running it with numpy is no way to write a thesis).

*The torch-rnn repository used by Elle O’Brien for romance novel titles was written by Justin Johnson.

International Journal of Ronotics Vol. 1, Issue 1

[original image]

Three things that aren’t robots, rated relative to a household thermostat.

Many humans remain worried that robots will soon come to take their jobs, seize control of institutions, or simply decide that the time of humans has gone on far too long already. On the other hand, techno-optimists like myself look forward to engaging with wholly different architectures of minds unconstrained by biology, and as capable automation continues to erode the jobs that can be best done by humans, we can all look forward to careers as artisan baristas selling fancy coffees back and forth to each other. For the most part, robot experts and comic artists alike are of the mind that the robot apocalypse is still a ways off. But that doesn’t stop an equally insidious menace lurking at research labs across the globe: non-robot machines masquerading as robots.

I think it stands to reason that not everything in the universe can be a robot, so we should make some effort to accurately organize the things we encounter into categories ‘robots’ and ‘not-robots.’ Here I will use the term ‘ro-not’ for things that are called robots by their creators but are in fact some other thing in the not-robot category.

This may seem pedantic, but it is actually emblematic of general problems of misplaced incentives in scientific research. By choosing terms not on the basis of clarity and accuracy, but rather for how impressive they sound in a press release, we mislead and erode public confidence and literacy in science. This is a bad thing that we should try to dissuade.

So what is a robot? Although many machines are robot-like, it should be easy to assess how close to being a robot a thing is. Put simply, A robot must be a machine that is able to sense its environment and change its behavior accordingly. That’s it, the bar is not set unreasonably high. An industrial robot arm blindly following a pre-computed path doesn’t fit this definition, but add a sensor used to halt operation when a human enters the danger zone and it does.

Below I’ll rate 3 of these so-called robots on a scale of 1-10, with 1 being a “slinky”, 5 being a thermostat, and 10 being a fully sapient machine. These are all called ‘robots’ by their creators, and often published in robotics-specific journals, despite none of the machines below rising above the sentience of a thermostat. That’s not to say that the machines or their inventors are, uh, not good, or even that they shouldn’t publish their devices in robotics journals, but rather we should all learn to call a spade a spade and an actuator an actuator.

Vine robots.

Original article: “A soft robot that navigates its environment through growth.” Science Robotics  Vol. 2, Issue 8. 19 Jul 2017.

At first glance this machine looks like an inside-out balloon, but on closer inspection you’ll notice that it is in fact an inside out balloon. The video demonstrates using the appendenge to turn off a valve in a simulated “I left the poisonous gas valve on” situation (happens to us all), and with a camera attached the thing appears to be phototropic. Turns out this is misleading, however, and in fact the pattern of the plastic expandables is pre-programmed by adding tape or otherwise constraining the walls of the plastic before inflating.

Rating: 2.5/10. Amended publication title: “An inside-out plastic balloon that can follow preprogrammed inflation routes.”

Soft robot octopus.

Original article: “An integrated design and fabrication strategy for entirely soft, autonomous robots” Nature 536, 451–455 (25 August 2016)

This machine is an interesting pneumatic device molded out of PDMS to look like a cute octopus. It is fully capable of wiggling it’s arms and blowing bubbles, and the blue and red coloring of the hydrogen peroxide solution that powers it makes it look pretty cool. The “octobot” alternates raising half of its appendages at a time, and the coolest thing about it is that it does so with a microfludic circuit. In addition to powering the arms, each of the two channels feeding the arms also powers a valve that restricts fuel flow to the other channel. The designers deem this device the “first fully autonomous soft robot,” however, its alternating arm-raising seems to be a futile movement (it does not seem to move) and it also doesn’t appear to be able to respond to its environment in any significant way. The “microfluidic logic” oscillator is pretty cool: the authors claim that this makes the machine fully autonomous because it is untethered, but neither is a slinky and I don’t call that a robot either.

Rating: (4.0/10). Amended title: “An integrated design and fabrication strategy for entirely cute, colorful oscillators.”

Amphibious bee robot.

Original article: “A biologically inspired, flapping-wing, hybrid aerial-aquatic microrobot.” Science Robotics. Vol. 2, Issue 11. Oct 2017. [paywalled]

There are a number of misleading aspects of how the press office and journalists portrayed this machine. The first thing you’ll notice in the video demonstrations are the tethers above and below the thing: clearly it is not operating under its own power. While tethering may be a bit uninspiring, there’s nothing in our robot-rule that says you have to carry your own batteries and compute everywhere to be considered a robot, so the tether itself doesn’t rule out potential robotness. On the other hand, in an interview with Science Friday the lead author describes her role in the operation of the device, which is that she makes all the decisions and activates each mode of operation manually (the wings move at a different frequency for swimming and flight, for example). The device also can’t fly when it’s wet, which is a bit misleading and seemed to be the whole point of being an amphibious bee instead of being an air-only or water only winged device. One particularly cool thing about this device is that it uses a small explosion to break the grip of surface tension at the water’s surface, powered by hydrogen and oxygen gas generated in an internal chamber by electrolysis.

Rating (3/10). Amended title: “A university press office-inspired, flapping-wing, hybrid aerial-aquatic device that explodes a bit.”

I wouldn’t argue that any of the above are, uh, not good. In fact they may be quite cool as what they are and could potentially be put to good use as part of a robot. Disagree with my ratings or definition of a robot? Let me know in the comments or @theScinder. If you are involved in any of these projects and have expanded on the original device to meet the robot criteria above, let me know and I’ll add an update.

Introducing Ceph-O-Vision

I’ve been interested in cephalopod vision ever since I learned that, despite their superb appreciation for chroma (as evidenced by their ability to match the color of their surroundings as well as texture and pattern), cuttlefish eyes contain only one light-sensitive pigment. Unlike ourselves and other multichromatic animals that perceive color as a mix of activations of different-colored light receptors, cuttlefish must have another way. So while the images coming into the brain of a cuttlefish might look something like this . . .

dfs

. . . they manage to interpret the images to precisely match their surroundings and communicate colorful displays to other cuttlefish. Some time ago Stubbs and Stubbs put forth the possibility that they might use chromatic aberrations to interpret color (I discussed and simulated what that might look like in this post). What looks like random flickering in the gif above is actually simulated focusing across chromatic aberrations. [original video]. Contrary to what one might think, defocus and aberration in images isn’t “wrong.” On the contrary, if you know how to interpret them they provide a wealth of information that might allow a cuttlefish to see the world in all its chromatic glory.

Top: learned color image based on chromatic aberration stack. Middle: Neural network color reconstitution Bottom: Ground truth color image

We shouldn’t expect the cuttlefish to experience their world in fuzzy grayscale any more than we should expect humans to perceive their world in an animal version of a Bayer array, each photoreceptor individually distinguished (not to mention distracting saccades, blind spot at the optic nerve, vasculature shadowing, etc.). Instead, just like us humans, they would learn to perceive the visual data produced by their optical system in whatever way makes the most sense and is most useful.

I piped simulated cuttlefish vision images into a convolutional neural network with corresponding color images as reference. The cuttle-vision images flow through the 7 layer network and are compared to the RGB targets on the other side. I started by building a dataset of simulated images consisting of randomly placed pixel-sized colored dots. This was supposed to be the easy “toy example” I started with before moving on to real images.


Left: training input, middle: network’s attempt at reconstitution, right: target. For pixel sized color features, the convolutional kernels of the network learn to blur the target pixels into ring shapes.

Bizarrely, the network learned to interpret these images as colored donuts, usually centered around the correct location but incapable of reconstituting the original layout. Contrary to what you might expect, the simple dataset performed poorly even with many training examples and color image reconstitution improved dramatically when I switched to real images. Training on a selection of landscape images looks something like this:


Center: Ceph-O-Vision color perception. Bottom: Ground truth RGB. Top: Chromatic aberration training images (stacked as a color image for viewing)

As we saw in first example, reconstituting sparse single pixels from chromatic aberration images trains very poorly. However, the network was able to learn random patterns of larger features (offering better local context) much more effectively:

Interestingly enough, the network learns to be most sensitive to edges. You can see in the training gif above that after 1024 epochs of training, the network mainly reconstitutes pattern edges. It never learns to exactly replicate the RGB pattern, but gets pretty close. It would be interesting to use a network like this to predict what sort of optical illusions a cuttlefish might be susceptible too. This could provide a way to test the chromatic aberration hypothesis in cephalopod vision. Wikipedia Imageby Hans Hillewaert used as a mask for randomly generated color patterns.

Finally, I trained the network on some footage of a hunting cuttlefish CC BY SA John Turnbull. Training on the full video, here’s what a single frame looks like as the network updates over about a thousand training epochs:

This project is far from a finished piece, but it’s already vastly improved my intuition for how convolutional neural networks interpret images. It also provides an interesting starting point for thinking about how cuttlefish visually communicate and perceive. If you want more of the technical and unpolished details, you can follow this project’s Github repository. I have a lot of ideas on what to try next: naturally some control training with a round pupil (and thus less chromatic aberration), but also to compare the simple network I’ve built so far to the neuroanatomy of cephalopods and to implement a “smart camera” version for learning in real-time. If you found this project interesting, or have your own cool ideas mixing CNNs and animal vision, be sure to let me know @theScinder or in the comments.