A Study in Red Herrings

I was recently assigned a programming assignment as part of the application process for a job. While I’ll respect the confidentiality of the actual coding assignment (it was weird), I can talk about the study tips they gave us in the homework invitation email, as these essentially had nothing to do with the actual assignment.

Applicants were encouraged to bone up on multi-layer dense neural networks, aka multi-layer perceptrons, using TensorFlow and TensorBoard. To get ready for the assignment, I built two six-layer MLPs at different levels of abstraction: a lower-level MLP using explicit matrix multiplication and activation, and a higher-level MLP using tf.layers and tf.contrib.learn. I used the iris, wine, and digits datasets from scikit-learn as these are small enough to iterate over a lot of variations without taking too much time. Although the exercise didn’t end up being specifically useful to the coding assignment, I did get more familiar with using TensorBoard and tf.summary commands.

Although my intention was to design identical models using different tools, and despite using the same Adam optimizer for training, the higher-level abstracted model performed much better (often achieving 100% accuracy on the validation datasets) than the model built around tf.matmul operations. Being a curious sort I set out to find out what was leading to the performance difference and built two more models mixing tf.layers, tf.contrib.learn, and tf.matmul.

In genetics research it’s common practice to determine relationships between genes and traits by breaking things until the trait disappears, than trying to restore the trait by externally adding specific genes back to compensate for the broken one. This would go fall under the terms “knockout” and “rescue,” respectively, and I took a similar approach here. My main findings were:

  • Replacing tf.matmul operations with tf.layers didn’t have much effect. Changing dropout and other hyperparameters did not seem to effect the low-level and high-level models differently.
  • “Knocking out” the use of learn.Estimator.fit from tf.contrib.learnand running the training optimizer directly led to significantly degraded performance of the tf.layers model.
  • The model built around tf.matmul could be “rescued” by training with learn.Estimator.fitinstead of train_op.run.
  • The higher-level model using layers did generally perform a little better than the lower-level model, especially on the digits dataset.

Cross-validation curves demonstrating the training efficacy of the different models are shown below:

Cross-validation accuracy curves for different random seeds using the tf.layers model.

Cross-validation accuracy curves for different random seeds using the tf.matmul model.

These MLPs perform pretty well (and converge in just a few minutes) on the small sklearn datasets. The four models are built to be readily modifiable and iterable, and can be accessed from the Git repository


Decomposing Autoencoder Conv-net Outputs with Wavelets

Replacing a bespoke image segmentation workflow using classical computer vision tasks with a simple, fully convolutional neural network isn’t too hard with modern compute and software libraries, at least not for the first part of the learning curve. The conv-net alleviates your fine-tuning overhead, decreases the total curation requirement (time spent correcting human-obvious mistakes), and it even expands the flexibility of your segmentations so that you can simultaneously identify the pixel locations of multiple different classes. Even if the model occasionally makes mistakes, it seems to do so in a way that makes it obvious what the net was “thinking,” and the mistakes are still pretty close. If this is so easy, why do we still even have humans?

In some ways conv-nets work almost too well for many computer vision tasks. Getting a reasonably good result and declaring it “good enough” is very tempting. It’s easy to get lackadaisical about a task that you wouldn’t even approach for automation a decade ago, leaving it to undergraduates[1] to manually assess images for “research experience” like focused zipheads[2]. But we can do better, and it’s important that we do so if we are to live in a desirable future. Biased algorithms are nothing new, and the ramifications of a misbehaving model remain the responsibility of its creators[3]

Take a 4 layer CNN trained to segment mitochondria from electron micrographs of brain tissue (training on an electron microscopy dataset from EPFL here. On a scale from Loch Stenness to Loch Ness, the depth of this network is the Bonneville Salt Flats. Nonetheless this puddle of neurons manages to get a reasonably good result after only a few hundred epochs.

I don’t think it would take too much in the way of post-processing to clean those segmentation results: a closing operator to get rid of the erroneous spots and smooth a few artifacts. But isn’t that defeating the point? The ease of getting good results gained early can be a bit misleading. Getting to 90% or even 95% effectiveness on a task can seem pretty easy thanks to the impressive learning capacity of conv-nets, but closing the gap of the last few percent, building a model that generalizes to new datasets, or better yet, transfers what it has learned to largely different tasks is much more difficult. With all the accelerated hardware and improved software libraries we have available today you may be only 30 minutes away from a perfect cat classifier, but you’re at least a few months of diligent work away from a conv-net that can match the image analysis efficacy of an undergrad for a new project.

Pooling operations are often touted as a principal contributor to conv-net classifier invariance, but this is controversial, and in any case most people who can afford the hardware for memory-intensive models are leaving them behind. It seems that pooling is probably more important for regularization than for feature invariance, but we’ll leave that discussion for another time. One side effect of pooling operations is that images are blurred as the x/y dimensions are reduced in deeper layers.

U-Net architectures and atrous convolutions are two strategies that have lately been shown to be effective elements of image segmentation models. The assumed effect for both strategies is better retention of high frequency details (as compared to fully convolutional networks). These counteract some of the blurring effect that comes from using pooling layers.

In this post, we’ll compare the frequency content retained in the output from different models. The training data is EM data from brain slices like the example above. I’m using the dataset from the 2012 ISBI 2D EM segmentation challenge for training and validation (published by Cardona et al., and we’ll compare the results using the EPFL dataset mentioned above as a test set.

To examine how these elements contribute to a vision model, we’ll train them on EM data as autoencoders. I’ve built one model for each strategy, constrained to have the same number of weights. The training process looks something like this (in the case of the fully convolutional model):

Dilated convolutions are an old concept revitalized to address problems associated with details lost to pooling operations by making them optional. This is accomplished by using dilated convolutional kernels (spacing the weights with zeros, or holes) to achieve long-distance context without pooling. In the image below, the dark squares are the active weights while the light gray ones are the “holes” (i.e. in French atrous). Where these kernels are convolved with a layer, they act like a larger kernel without having to learn/store additional weights.

U-Net architectures, on the other hand, utilize skip connections to bring information from the early, less-pooled layers to later layers. The main risk I see in using U-Net architectures is that for a particularly deep model the network may develop an over-reliance on the skip connections. This would mean the very early layers will train faster and have a bigger influence on the model, losing out on the capacity for more abstract feature representations in the layers at the bottom of the “U”.

Using atrous convolutions makes for noticeably better autoencoding fidelity compared to a simple fully convolutional network:

While training with the UNet architecture produces images that are hardly discernible from the originals. Note that the images here are from the validation set, they aren’t seen by the model during training steps.

If you compare the results qualitatively, the U-Net architecture is a clear winner in terms of the sharpness of the decoded output. By the looks of it the U-Net is probably more susceptible to fitting noise as well, at least in this configuration. Using dilated convolutions also offers improved detail reconstruction compared to the fully convolutional network, but it does eat up more memory and trains more slowly due to the wide interior layers.

This seemed like a good opportunity to bring out wavelet analysis to quantify the differences in autoencoder output. We’ll use wavelet image decomposition to investigate which frequency levels are most prevalent in the decoded output from each model. Image decomposition with wavelets looks something like this:

The top-left image has been downsized 2x from the original by removing the details with a wavelet transform (using Daubechies 1). The details left over in the other quadrants correspond to the high frequency content oriented to the vertical, horizontal, and diagonal directions. By computing wavelet decompositions of the conv-net outputs and comparing the normalized sums at each level, we should be able to get a good idea of where the information of the image resides. You can get an impression of the first level of wavelet decomposition for output images from the various models in the examples below:

And finally, if we calculate the normalized power for each level of wavelet decomposition we can see where the majority of the information of the corresponding image resides. The metrics below are the average of 100 autoencoded images from the test dataset.

In the plot, spatial frequencies increase with decreasing levels from left to right. Level 8 refers to the 8th level of the wavelet decomposition, aka the average gray level in this case. The model using a U-Net architecture is the closest to recapitulating all the spatial frequencies of the original image, with the noticeable exception of an about 60% decrease in image intensity at the very highest spatial frequencies.

I’d say the difference between the U-Net output and the original image is mostly down to a denoising effect. The atrous conv-net is not too far behind the U-Net in terms of spatial frequency fidelity, and the choice of model variant probably would depend on the end use. For example, there are some very small sub-organellar dot features that are resolved in the U-Net reconstruction but not the atrous model. If we wanted to segment those features, we’d definitely choose the U-Net. On the other hand, the atrous net would probably suffer less from over-fitting if we wanted to train for segmenting the larger mitochondria and only have a small dataset to train on. Finally, if all we want is to coarsely identify the cellular boundaries, that’s basically what we see in the autoencoder output from the fully convolutional network.

Hopefully this has been a helpful exercise in examining conv-net capabilities in a simple example. Open questions for this set of models remain. Which model performs the best on an actual semantic segmentation task? Does the U-Net rely too much on the skip connections?

I’m working with these models in a repository where I plan to keep notes and code for experimenting with ideas from the machine learning literature and you’re welcome to use the models therein for your own experiments.

Datasets from:

A. Lucchi, K.Smith, R. Achanta, G. Knott, P. Fua, Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks with Learned Shape Features, IEEE Transactions on Medical Imaging, Vol. 30, Nr. 11, October 2011.

Albert Cardona, Stephan Saalfeld, Stephan Preibisch, Benjamin Schmid, Anchi Cheng, Jim Pulokas, Pavel Tomancak, Volker Hartenstein. An Integrated Micro- and Macroarchitectural Analysis of the Drosophila Brain by Computer-Assisted Serial Section Electron Microscopy. PLOS 2010

Zebra: https://commons.wikimedia.org/wiki/Zebra#/media/File:Three_Zebras_Drinking.jpg

Relevant articles:

Olaf Ronneberger, Philipp Fischer, Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. Arxiv. https://arxiv.org/abs/1505.04597

Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. Rethinking Atrous Convolution for Semantic Image Segmentation. Arxiv. https://arxiv.org/abs/1706.05587

[1] My first job in a research laboratory was to dig through soil samples with fine tweezers to remove roots. We don’t have robots to do this (yet) but I can’t imagine a bored undergraduate producing replicable results in this scenario, and the same goes for manual image segmenation or assessment. On the other hand the undergrad will probably give the best results, albeit with a high standard deviation, as they are likely to have the most ambiguous understanding of the professor’s hypothesis and desired results of anyone in the lab.

[2] I am indeed reading A Deepness in the Sky.

[3] (o_o) / (^_^) / (*~*)

LAST MINUTE DIY Pinhole Viewer for Eclipse (4 steps)

Nobody invited you to their Great American Eclipse 2017 party? This is the first you’ve heard of it? Maybe you just forgot to prepare, with all that procrastination you’ve had to do since the last total eclipse in 1979. Don’t worry! You still have time. You can impress your friends and delight your coworkers with this

Step 1: Get a hand

Maybe you have one of these lying around the house, or you can borrow one from a friend. Pretty much any model will do, so don’t waste time being too picky.

Step 2: Make hand into pinhole shape

OK here’s the tricky part. Hopefully you’ve managed to keep track of that hand since step 1. Make a pinhole with it. Anticipate. The Great Eclipse is going to be portenting real epic-like any minute now.

Step 3: Arrange the pinhole in between the sun and a flat, light surface.

Alright here’s the second tricky part. You’ll want to really get this dialed in before the eclipse starts. Put the hand in between the sun and a viewing surface, I used a sheet of paper. Make sure the hand is still in a pinhole shape. Change the angle and position of the hand until an image of the sun forms on the paper, it might take a few minutes to get everything lined up. The farther the pinhole is from the surface the larger the projected image but the more difficult it is to maintain alignment, and eventually the image is too dim to compete with ambient scattered light. Eclipses happen pretty often if you’re willing to travel, but the next one crossing the lower 48 won’t happen until 2024.

Step 4: Realize you are not in the continental United States right now.


Journalistic Phylogeny of the Silicon Valley Apocalypse

For some reason, doomsday mania is totally in this season.

In 2014 I talked about the tendency of internet writers to regurgitate the press release for trendy science news. The direct lineage from press release to press coverage makes it easy for writers to phone it in: university press offices essentially hand out pre-written sensationalist versions of recent publications. It’s not surprising that with so much of the resulting material in circulation taking text verbatim from the same origin, it is possible to visualize the similarities as genetic sequences in a phylogenetic tree.

Recently the same sort of journalistic laziness reared its head as stories about the luxury doomsday prepper market. Evan Osnos at The New Yorker wrote an article describing the trend in Silicon Valley to buy up bunkers, bullets, and body armor-they think we’ll all soon rise up against them following the advent of A.I. Without a press release to serve as a ready-made template, other outlets turned to reporting on the New Yorker story itself as if it were a primary source. This is a bit different than copying down the press release as your own, and the inheritance is not as direct. If anything, this practice is even more hackneyed. At least a press office puts out their releases with the intention that the text serves as material for coverage so that the topic gets as much circulation as possible. Covering another story as a primary source, rather than writing an original commentary or rebuttal, is just a way to skim traffic off a trend.

In any case, I decided to subject this batch of articles to my previous workflow: converting the text to a DNA sequence with DNA writer by Lensyl Urbano, aligning the sequences with MAFFT and/or T-Coffee Expresso, and using the distances from the alignment to make a tree in Phyl.io. Here’s the result:


Heredity isn’t as clear-cut as it was when I looked at science articles: there’s more remixing in this case and we see that in increased branch distances from the New Yorker article to most of the others. Interestingly, there are a few articles that are quite close to each other, much more so than they are to the New Yorker article. Perhaps this rabbit hole of quasi-plagiarism is even deeper than it first appears, with one article covering another article about an article about an article. . .

In any case, now that I’ve gone through this workflow twice, the next time I’ll be obligated to automate the whole thing in Python.

You can tinker with the MAFFT alignment, at least for a while, here:

My tree:





a href=”

New Year’s Eve 2015


A particular pair of protein structures, Protein Database designations 1afz and 3coS. Depending on your genotype you may be strongly cursing or thanking these enzymes later.

X-Ray diffraction solution data from:

Kavanagh, K.L., Shafqat, N., Yue, W., von Delft, F., Bishop, S., Roos, A., Murray, J., Edwards, A.M., Arrowsmith, C.H., Bountra, C., Oppermann, U. Crystal structure of human class II alcohol dehydrogenase (ADH4) in complex with NAD and Zn. To Be Published

Steinmetz, C.G., Xie, P., Weiner, H., Hurley, T.D. Structure of mitochondrial aldehyde dehydrogenase: the genetic component of ethanol aversion (1997) Structure 5: 701-711. PubMed: 9195888

Return to the Play Area

When life gives you lemons, go exploring

xkcd is celebrating Randall Munroes new book with a browser game. It doesn’t take long to guess that there’s a lot more than the small coin-gathering area the player starts out in.

After escaping the small coin maze and vaulting a barrier wall on the left. . .

Eventually the first foray of our brave protagonist came to an end when I got stuck in a statue of George Washington.