A Study in Red Herrings

I was recently assigned a programming assignment as part of the application process for a job. While I’ll respect the confidentiality of the actual coding assignment (it was weird), I can talk about the study tips they gave us in the homework invitation email, as these essentially had nothing to do with the actual assignment.

Applicants were encouraged to bone up on multi-layer dense neural networks, aka multi-layer perceptrons, using TensorFlow and TensorBoard. To get ready for the assignment, I built two six-layer MLPs at different levels of abstraction: a lower-level MLP using explicit matrix multiplication and activation, and a higher-level MLP using tf.layers and tf.contrib.learn. I used the iris, wine, and digits datasets from scikit-learn as these are small enough to iterate over a lot of variations without taking too much time. Although the exercise didn’t end up being specifically useful to the coding assignment, I did get more familiar with using TensorBoard and tf.summary commands.

Although my intention was to design identical models using different tools, and despite using the same Adam optimizer for training, the higher-level abstracted model performed much better (often achieving 100% accuracy on the validation datasets) than the model built around tf.matmul operations. Being a curious sort I set out to find out what was leading to the performance difference and built two more models mixing tf.layers, tf.contrib.learn, and tf.matmul.

In genetics research it’s common practice to determine relationships between genes and traits by breaking things until the trait disappears, than trying to restore the trait by externally adding specific genes back to compensate for the broken one. This would go fall under the terms “knockout” and “rescue,” respectively, and I took a similar approach here. My main findings were:

  • Replacing tf.matmul operations with tf.layers didn’t have much effect. Changing dropout and other hyperparameters did not seem to effect the low-level and high-level models differently.
  • “Knocking out” the use of learn.Estimator.fit from tf.contrib.learnand running the training optimizer directly led to significantly degraded performance of the tf.layers model.
  • The model built around tf.matmul could be “rescued” by training with learn.Estimator.fitinstead of train_op.run.
  • The higher-level model using layers did generally perform a little better than the lower-level model, especially on the digits dataset.

Cross-validation curves demonstrating the training efficacy of the different models are shown below:

Cross-validation accuracy curves for different random seeds using the tf.layers model.

Cross-validation accuracy curves for different random seeds using the tf.matmul model.

These MLPs perform pretty well (and converge in just a few minutes) on the small sklearn datasets. The four models are built to be readily modifiable and iterable, and can be accessed from the Git repository

Advertisements

Decomposing Autoencoder Conv-net Outputs with Wavelets

Replacing a bespoke image segmentation workflow using classical computer vision tasks with a simple, fully convolutional neural network isn’t too hard with modern compute and software libraries, at least not for the first part of the learning curve. The conv-net alleviates your fine-tuning overhead, decreases the total curation requirement (time spent correcting human-obvious mistakes), and it even expands the flexibility of your segmentations so that you can simultaneously identify the pixel locations of multiple different classes. Even if the model occasionally makes mistakes, it seems to do so in a way that makes it obvious what the net was “thinking,” and the mistakes are still pretty close. If this is so easy, why do we still even have humans?

In some ways conv-nets work almost too well for many computer vision tasks. Getting a reasonably good result and declaring it “good enough” is very tempting. It’s easy to get lackadaisical about a task that you wouldn’t even approach for automation a decade ago, leaving it to undergraduates[1] to manually assess images for “research experience” like focused zipheads[2]. But we can do better, and it’s important that we do so if we are to live in a desirable future. Biased algorithms are nothing new, and the ramifications of a misbehaving model remain the responsibility of its creators[3]

Take a 4 layer CNN trained to segment mitochondria from electron micrographs of brain tissue (training on an electron microscopy dataset from EPFL here. On a scale from Loch Stenness to Loch Ness, the depth of this network is the Bonneville Salt Flats. Nonetheless this puddle of neurons manages to get a reasonably good result after only a few hundred epochs.

I don’t think it would take too much in the way of post-processing to clean those segmentation results: a closing operator to get rid of the erroneous spots and smooth a few artifacts. But isn’t that defeating the point? The ease of getting good results gained early can be a bit misleading. Getting to 90% or even 95% effectiveness on a task can seem pretty easy thanks to the impressive learning capacity of conv-nets, but closing the gap of the last few percent, building a model that generalizes to new datasets, or better yet, transfers what it has learned to largely different tasks is much more difficult. With all the accelerated hardware and improved software libraries we have available today you may be only 30 minutes away from a perfect cat classifier, but you’re at least a few months of diligent work away from a conv-net that can match the image analysis efficacy of an undergrad for a new project.

Pooling operations are often touted as a principal contributor to conv-net classifier invariance, but this is controversial, and in any case most people who can afford the hardware for memory-intensive models are leaving them behind. It seems that pooling is probably more important for regularization than for feature invariance, but we’ll leave that discussion for another time. One side effect of pooling operations is that images are blurred as the x/y dimensions are reduced in deeper layers.

U-Net architectures and atrous convolutions are two strategies that have lately been shown to be effective elements of image segmentation models. The assumed effect for both strategies is better retention of high frequency details (as compared to fully convolutional networks). These counteract some of the blurring effect that comes from using pooling layers.

In this post, we’ll compare the frequency content retained in the output from different models. The training data is EM data from brain slices like the example above. I’m using the dataset from the 2012 ISBI 2D EM segmentation challenge for training and validation (published by Cardona et al., and we’ll compare the results using the EPFL dataset mentioned above as a test set.

To examine how these elements contribute to a vision model, we’ll train them on EM data as autoencoders. I’ve built one model for each strategy, constrained to have the same number of weights. The training process looks something like this (in the case of the fully convolutional model):

Dilated convolutions are an old concept revitalized to address problems associated with details lost to pooling operations by making them optional. This is accomplished by using dilated convolutional kernels (spacing the weights with zeros, or holes) to achieve long-distance context without pooling. In the image below, the dark squares are the active weights while the light gray ones are the “holes” (i.e. in French atrous). Where these kernels are convolved with a layer, they act like a larger kernel without having to learn/store additional weights.

U-Net architectures, on the other hand, utilize skip connections to bring information from the early, less-pooled layers to later layers. The main risk I see in using U-Net architectures is that for a particularly deep model the network may develop an over-reliance on the skip connections. This would mean the very early layers will train faster and have a bigger influence on the model, losing out on the capacity for more abstract feature representations in the layers at the bottom of the “U”.

Using atrous convolutions makes for noticeably better autoencoding fidelity compared to a simple fully convolutional network:

While training with the UNet architecture produces images that are hardly discernible from the originals. Note that the images here are from the validation set, they aren’t seen by the model during training steps.

If you compare the results qualitatively, the U-Net architecture is a clear winner in terms of the sharpness of the decoded output. By the looks of it the U-Net is probably more susceptible to fitting noise as well, at least in this configuration. Using dilated convolutions also offers improved detail reconstruction compared to the fully convolutional network, but it does eat up more memory and trains more slowly due to the wide interior layers.

This seemed like a good opportunity to bring out wavelet analysis to quantify the differences in autoencoder output. We’ll use wavelet image decomposition to investigate which frequency levels are most prevalent in the decoded output from each model. Image decomposition with wavelets looks something like this:

The top-left image has been downsized 2x from the original by removing the details with a wavelet transform (using Daubechies 1). The details left over in the other quadrants correspond to the high frequency content oriented to the vertical, horizontal, and diagonal directions. By computing wavelet decompositions of the conv-net outputs and comparing the normalized sums at each level, we should be able to get a good idea of where the information of the image resides. You can get an impression of the first level of wavelet decomposition for output images from the various models in the examples below:

And finally, if we calculate the normalized power for each level of wavelet decomposition we can see where the majority of the information of the corresponding image resides. The metrics below are the average of 100 autoencoded images from the test dataset.

In the plot, spatial frequencies increase with decreasing levels from left to right. Level 8 refers to the 8th level of the wavelet decomposition, aka the average gray level in this case. The model using a U-Net architecture is the closest to recapitulating all the spatial frequencies of the original image, with the noticeable exception of an about 60% decrease in image intensity at the very highest spatial frequencies.

I’d say the difference between the U-Net output and the original image is mostly down to a denoising effect. The atrous conv-net is not too far behind the U-Net in terms of spatial frequency fidelity, and the choice of model variant probably would depend on the end use. For example, there are some very small sub-organellar dot features that are resolved in the U-Net reconstruction but not the atrous model. If we wanted to segment those features, we’d definitely choose the U-Net. On the other hand, the atrous net would probably suffer less from over-fitting if we wanted to train for segmenting the larger mitochondria and only have a small dataset to train on. Finally, if all we want is to coarsely identify the cellular boundaries, that’s basically what we see in the autoencoder output from the fully convolutional network.

Hopefully this has been a helpful exercise in examining conv-net capabilities in a simple example. Open questions for this set of models remain. Which model performs the best on an actual semantic segmentation task? Does the U-Net rely too much on the skip connections?

I’m working with these models in a repository where I plan to keep notes and code for experimenting with ideas from the machine learning literature and you’re welcome to use the models therein for your own experiments.

Datasets from:

A. Lucchi, K.Smith, R. Achanta, G. Knott, P. Fua, Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks with Learned Shape Features, IEEE Transactions on Medical Imaging, Vol. 30, Nr. 11, October 2011.

Albert Cardona, Stephan Saalfeld, Stephan Preibisch, Benjamin Schmid, Anchi Cheng, Jim Pulokas, Pavel Tomancak, Volker Hartenstein. An Integrated Micro- and Macroarchitectural Analysis of the Drosophila Brain by Computer-Assisted Serial Section Electron Microscopy. PLOS 2010

Zebra: https://commons.wikimedia.org/wiki/Zebra#/media/File:Three_Zebras_Drinking.jpg

Relevant articles:

Olaf Ronneberger, Philipp Fischer, Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. Arxiv. https://arxiv.org/abs/1505.04597

Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. Rethinking Atrous Convolution for Semantic Image Segmentation. Arxiv. https://arxiv.org/abs/1706.05587

[1] My first job in a research laboratory was to dig through soil samples with fine tweezers to remove roots. We don’t have robots to do this (yet) but I can’t imagine a bored undergraduate producing replicable results in this scenario, and the same goes for manual image segmenation or assessment. On the other hand the undergrad will probably give the best results, albeit with a high standard deviation, as they are likely to have the most ambiguous understanding of the professor’s hypothesis and desired results of anyone in the lab.

[2] I am indeed reading A Deepness in the Sky.

[3] (o_o) / (^_^) / (*~*)

All work and no play makes JAIck a dull boy

Fun with reservoir computing

By now you’ve probably read Andrej Karpathy’s blog post The Unreasonable Effectiveness of Recurrent Neural Networks, and if you haven’t, you definitely should. Andrej’s RNN examples are the inspiration for many of the RNNs doing* silly things that get picked up by the lay press. The impressive part is how these artificial texts land squarely in the uncanny valley solely by predicting the next character one by one. The results almost, but not quite, read like perfectly reasonable nonsense written by a human.

In fact we can approach similar tasks with a healthy injection of chaos. Reservoir computing has many variants, but the basic premise is always the same. Data is fed into a complex chaotic system called a reservoir, among other things, giving rise to long-lived dynamic states. By feeding the states of the system into a simple linear regression model, it’s possible to accomplish reasonably complicated tasks without training the dynamic reservoir at all. It’s like analyzing wingbeat patterns of Australian butterflies by observing hurricanes in Florida.

A computing reservoir can be made out of a wide variety of chaotic systems (e.g. a jointed pendulum) amenable to taking some sort of input but the ones most akin to neural networks consist of a big network of sparsely connected nodes with random weights and a non-linear activation function. In this example I’ll use a Schmitt trigger relaxation oscillator. If you have trained a few neural networks yourself, you can think of this as simply a tanh activation function with hysteresis. If you build/built BEAM robots, you can think of the activation function as one neuron in a 74HC14 based microcore you use to control walking robots.

By self-connecting a large vector of nodes to itself and to an input, it’s possible to get a complex response from very simple input. The response to a step function in this example is long lived, but it does die out eventually.

The activity in the reservoir looks a little like wave action in a liquid, as there tends to be a similar amount of energy in the system at any given time. With very little damping this can go on for a long time. The gif below demonstrates long-lived oscillations in a reservoir after a single unit impulse input. After a few thousand iterations of training, what we get is starting to look a little less like gibberish. What’s particularly interesting is that

But what about the funny text? Can reservoir computing make us laugh? Let’s start by testing whether reservoir computing can match the writing proficiency of famous fictional writer Jack Torrance. The animations below demonstrate the learning process: the dynamic reservoir carries a chaotic memory of past input characters, and a linear classifier predicts each next character as a probability. At first the combined system ouptputs nonsense, and we can see that the character predictions are very dynamic and random. Then the system gets very confident, but very stupid.

Later the system begins to learn words and the character probabilities are adapting to previous characters in a sensible matter-all without back-propagating into the reservoir at all.

After a while the system learns to reliably produce the phrase “All work and no play makes Jack a dull boy.”

If you want to try it yourself, this gist depends only on numpy. I made a normal GH repository for my code for generating figures and text here. There may be a Tensorflow version for training faster/on more complicated texts in the works (running it with numpy is no way to write a thesis).

*The torch-rnn repository used by Elle O’Brien for romance novel titles was written by Justin Johnson.

International Journal of Ronotics Vol. 1, Issue 1

[original image]

Three things that aren’t robots, rated relative to a household thermostat.

Many humans remain worried that robots will soon come to take their jobs, seize control of institutions, or simply decide that the time of humans has gone on far too long already. On the other hand, techno-optimists like myself look forward to engaging with wholly different architectures of minds unconstrained by biology, and as capable automation continues to erode the jobs that can be best done by humans, we can all look forward to careers as artisan baristas selling fancy coffees back and forth to each other. For the most part, robot experts and comic artists alike are of the mind that the robot apocalypse is still a ways off. But that doesn’t stop an equally insidious menace lurking at research labs across the globe: non-robot machines masquerading as robots.

I think it stands to reason that not everything in the universe can be a robot, so we should make some effort to accurately organize the things we encounter into categories ‘robots’ and ‘not-robots.’ Here I will use the term ‘ro-not’ for things that are called robots by their creators but are in fact some other thing in the not-robot category.

This may seem pedantic, but it is actually emblematic of general problems of misplaced incentives in scientific research. By choosing terms not on the basis of clarity and accuracy, but rather for how impressive they sound in a press release, we mislead and erode public confidence and literacy in science. This is a bad thing that we should try to dissuade.

So what is a robot? Although many machines are robot-like, it should be easy to assess how close to being a robot a thing is. Put simply, A robot must be a machine that is able to sense its environment and change its behavior accordingly. That’s it, the bar is not set unreasonably high. An industrial robot arm blindly following a pre-computed path doesn’t fit this definition, but add a sensor used to halt operation when a human enters the danger zone and it does.

Below I’ll rate 3 of these so-called robots on a scale of 1-10, with 1 being a “slinky”, 5 being a thermostat, and 10 being a fully sapient machine. These are all called ‘robots’ by their creators, and often published in robotics-specific journals, despite none of the machines below rising above the sentience of a thermostat. That’s not to say that the machines or their inventors are, uh, not good, or even that they shouldn’t publish their devices in robotics journals, but rather we should all learn to call a spade a spade and an actuator an actuator.

Vine robots.

Original article: “A soft robot that navigates its environment through growth.” Science Robotics  Vol. 2, Issue 8. 19 Jul 2017.

At first glance this machine looks like an inside-out balloon, but on closer inspection you’ll notice that it is in fact an inside out balloon. The video demonstrates using the appendenge to turn off a valve in a simulated “I left the poisonous gas valve on” situation (happens to us all), and with a camera attached the thing appears to be phototropic. Turns out this is misleading, however, and in fact the pattern of the plastic expandables is pre-programmed by adding tape or otherwise constraining the walls of the plastic before inflating.

Rating: 2.5/10. Amended publication title: “An inside-out plastic balloon that can follow preprogrammed inflation routes.”

Soft robot octopus.

Original article: “An integrated design and fabrication strategy for entirely soft, autonomous robots” Nature 536, 451–455 (25 August 2016)

This machine is an interesting pneumatic device molded out of PDMS to look like a cute octopus. It is fully capable of wiggling it’s arms and blowing bubbles, and the blue and red coloring of the hydrogen peroxide solution that powers it makes it look pretty cool. The “octobot” alternates raising half of its appendages at a time, and the coolest thing about it is that it does so with a microfludic circuit. In addition to powering the arms, each of the two channels feeding the arms also powers a valve that restricts fuel flow to the other channel. The designers deem this device the “first fully autonomous soft robot,” however, its alternating arm-raising seems to be a futile movement (it does not seem to move) and it also doesn’t appear to be able to respond to its environment in any significant way. The “microfluidic logic” oscillator is pretty cool: the authors claim that this makes the machine fully autonomous because it is untethered, but neither is a slinky and I don’t call that a robot either.

Rating: (4.0/10). Amended title: “An integrated design and fabrication strategy for entirely cute, colorful oscillators.”

Amphibious bee robot.

Original article: “A biologically inspired, flapping-wing, hybrid aerial-aquatic microrobot.” Science Robotics. Vol. 2, Issue 11. Oct 2017. [paywalled]

There are a number of misleading aspects of how the press office and journalists portrayed this machine. The first thing you’ll notice in the video demonstrations are the tethers above and below the thing: clearly it is not operating under its own power. While tethering may be a bit uninspiring, there’s nothing in our robot-rule that says you have to carry your own batteries and compute everywhere to be considered a robot, so the tether itself doesn’t rule out potential robotness. On the other hand, in an interview with Science Friday the lead author describes her role in the operation of the device, which is that she makes all the decisions and activates each mode of operation manually (the wings move at a different frequency for swimming and flight, for example). The device also can’t fly when it’s wet, which is a bit misleading and seemed to be the whole point of being an amphibious bee instead of being an air-only or water only winged device. One particularly cool thing about this device is that it uses a small explosion to break the grip of surface tension at the water’s surface, powered by hydrogen and oxygen gas generated in an internal chamber by electrolysis.

Rating (3/10). Amended title: “A university press office-inspired, flapping-wing, hybrid aerial-aquatic device that explodes a bit.”

I wouldn’t argue that any of the above are, uh, not good. In fact they may be quite cool as what they are and could potentially be put to good use as part of a robot. Disagree with my ratings or definition of a robot? Let me know in the comments or @theScinder. If you are involved in any of these projects and have expanded on the original device to meet the robot criteria above, let me know and I’ll add an update.

Introducing Ceph-O-Vision

I’ve been interested in cephalopod vision ever since I learned that, despite their superb appreciation for chroma (as evidenced by their ability to match the color of their surroundings as well as texture and pattern), cuttlefish eyes contain only one light-sensitive pigment. Unlike ourselves and other multichromatic animals that perceive color as a mix of activations of different-colored light receptors, cuttlefish must have another way. So while the images coming into the brain of a cuttlefish might look something like this . . .

dfs

. . . they manage to interpret the images to precisely match their surroundings and communicate colorful displays to other cuttlefish. Some time ago Stubbs and Stubbs put forth the possibility that they might use chromatic aberrations to interpret color (I discussed and simulated what that might look like in this post). What looks like random flickering in the gif above is actually simulated focusing across chromatic aberrations. [original video]. Contrary to what one might think, defocus and aberration in images isn’t “wrong.” On the contrary, if you know how to interpret them they provide a wealth of information that might allow a cuttlefish to see the world in all its chromatic glory.

Top: learned color image based on chromatic aberration stack. Middle: Neural network color reconstitution Bottom: Ground truth color image

We shouldn’t expect the cuttlefish to experience their world in fuzzy grayscale any more than we should expect humans to perceive their world in an animal version of a Bayer array, each photoreceptor individually distinguished (not to mention distracting saccades, blind spot at the optic nerve, vasculature shadowing, etc.). Instead, just like us humans, they would learn to perceive the visual data produced by their optical system in whatever way makes the most sense and is most useful.

I piped simulated cuttlefish vision images into a convolutional neural network with corresponding color images as reference. The cuttle-vision images flow through the 7 layer network and are compared to the RGB targets on the other side. I started by building a dataset of simulated images consisting of randomly placed pixel-sized colored dots. This was supposed to be the easy “toy example” I started with before moving on to real images.


Left: training input, middle: network’s attempt at reconstitution, right: target. For pixel sized color features, the convolutional kernels of the network learn to blur the target pixels into ring shapes.

Bizarrely, the network learned to interpret these images as colored donuts, usually centered around the correct location but incapable of reconstituting the original layout. Contrary to what you might expect, the simple dataset performed poorly even with many training examples and color image reconstitution improved dramatically when I switched to real images. Training on a selection of landscape images looks something like this:


Center: Ceph-O-Vision color perception. Bottom: Ground truth RGB. Top: Chromatic aberration training images (stacked as a color image for viewing)

As we saw in first example, reconstituting sparse single pixels from chromatic aberration images trains very poorly. However, the network was able to learn random patterns of larger features (offering better local context) much more effectively:

Interestingly enough, the network learns to be most sensitive to edges. You can see in the training gif above that after 1024 epochs of training, the network mainly reconstitutes pattern edges. It never learns to exactly replicate the RGB pattern, but gets pretty close. It would be interesting to use a network like this to predict what sort of optical illusions a cuttlefish might be susceptible too. This could provide a way to test the chromatic aberration hypothesis in cephalopod vision. Wikipedia Imageby Hans Hillewaert used as a mask for randomly generated color patterns.

Finally, I trained the network on some footage of a hunting cuttlefish CC BY SA John Turnbull. Training on the full video, here’s what a single frame looks like as the network updates over about a thousand training epochs:

This project is far from a finished piece, but it’s already vastly improved my intuition for how convolutional neural networks interpret images. It also provides an interesting starting point for thinking about how cuttlefish visually communicate and perceive. If you want more of the technical and unpolished details, you can follow this project’s Github repository. I have a lot of ideas on what to try next: naturally some control training with a round pupil (and thus less chromatic aberration), but also to compare the simple network I’ve built so far to the neuroanatomy of cephalopods and to implement a “smart camera” version for learning in real-time. If you found this project interesting, or have your own cool ideas mixing CNNs and animal vision, be sure to let me know @theScinder or in the comments.

LAST MINUTE DIY Pinhole Viewer for Eclipse (4 steps)

Nobody invited you to their Great American Eclipse 2017 party? This is the first you’ve heard of it? Maybe you just forgot to prepare, with all that procrastination you’ve had to do since the last total eclipse in 1979. Don’t worry! You still have time. You can impress your friends and delight your coworkers with this

Step 1: Get a hand

Maybe you have one of these lying around the house, or you can borrow one from a friend. Pretty much any model will do, so don’t waste time being too picky.

Step 2: Make hand into pinhole shape

OK here’s the tricky part. Hopefully you’ve managed to keep track of that hand since step 1. Make a pinhole with it. Anticipate. The Great Eclipse is going to be portenting real epic-like any minute now.

Step 3: Arrange the pinhole in between the sun and a flat, light surface.

Alright here’s the second tricky part. You’ll want to really get this dialed in before the eclipse starts. Put the hand in between the sun and a viewing surface, I used a sheet of paper. Make sure the hand is still in a pinhole shape. Change the angle and position of the hand until an image of the sun forms on the paper, it might take a few minutes to get everything lined up. The farther the pinhole is from the surface the larger the projected image but the more difficult it is to maintain alignment, and eventually the image is too dim to compete with ambient scattered light. Eclipses happen pretty often if you’re willing to travel, but the next one crossing the lower 48 won’t happen until 2024.

Step 4: Realize you are not in the continental United States right now.

Nevermind.

What if they had put off the LIGO upgrades?

If a neutron star falls into a black hole but no one has upgraded the gravitational observatory to the required sensitivity, does it fail completely to change our view of the universe?

The Advanced Laser Interferometry Gravitational Observatory (aLIGO) consists of a pair of Fabry–Pérot Interferometers spaced about 3000 km apart, each sporting two cavities about 4 km long and sensitive to length changes smaller than a proton. The tubes containing the optics operate at a vacuum with about 10 times lower pressure than that experienced by the International Space Station in low earth orbit. The lasers put out in excess of 100 kW of laser power, and the power in the chambers is further amplified by each photon reflecting off of the test mass and back several hundred times. Each 20 kg test mass is balanced precariously on threads of glass thinner than things that are really rather thin already. In other words, it’s a huge friggin’ laser powerful enough to burn a burrito, with components precariously balanced in an inside out space ship.

On the 14th of September 2015, these instruments recorded measurements that would support the idea that spacetime changes size when masses accelerate. We usually refer to the instruments and all aspects of the research program supporting it by the same acronym: LIGO. Perhaps you’ve heard of it?

Although the colloquial story is that LIGO recorded the historic GW150914 gravitational wave event during an engineering run even before beginning formal scientific data collection, this isn’t strictly true. In fact LIGO had been performing science runs at Hanford and Livingston sites since 2002. In 2005, LIGO reached an original design sensitivity of strain detection on the order of one part in 1021. Another way to think about, and the common way to report, the sensitivity of the instruments is the distance at which a typical neutron-pair inspiral could nominally be detected. One part in 1021 strain sensitivity corresponds to a search distance of about 8 million parsecs (about 26 million light years). This was the sort of sensitivity LIGO was capable of up until the latter part of 2010. As impressive as that is, there were no gravitational wave detections during operation of LIGO from 2002 to 2010.

The now famous GW150914 and subsequent detections GW151226 and GW170104 came after a comprehensive suite of upgrades that boosted sensitivity to a search distance of 80 million parsecs (~262 light years) away. Four years of shutdown beginning in 2010 marked the transition from “intial LIGO” to “advanced LIGO” (aLIGO). Four years sounds like quite a while in human time, and an especially conservative experimenter might be wont to keep collecting data until proof-of-concept is established. As long as the machine is working in some rudimentary fashion, pushing to eke out just one detection before shutting down for risky upgrades might sound like it makes sense. What if LIGO had put off the upgrades to instead continue with scientific runs? Not much, as it turns out.

Our best guess for the frequency of observable events is based on what aLIGO picked up in the first science run. The first advanced run had about 1100 hours of uptime, time when both instruments were locked-in and active. During this run aLIGO’s picked up 2 confirmed events (and one almost event, yet unconfirmed), giving us a rate of 2 events per 1100 hours in a volume of 2.145 trillion parsecs cubed (the search volume for an 80 Mega-parsec detection distance). This leads us to expect 1 detection for every 22.92 days of run time, or about 16 detections per year, not considering instrument downtime.

Prioritizing data collection at the cost of forgoing upgrades, we would probably still be waiting on the big announcement. Operating at a pre-2014 sensitivity of 8 Mparsecs, we could expect a detection on average once ever 62 years. Assuming a Poisson distribution (events are random), the chances of one or more detections in 4 years of data collection, pre-aLIGO sensitivity, would be just a tick over 6%. For a 50/50 split in the odds of making a detection, we’d have to wait 44 years. Chances are, funding bodies could very well lose interest in that time, and we certainly would not have seen the international enthusiasm in gravitational wave research resulting from the GW150914 announcement.

The moral of the story? The difference between being “productive” and creating something great lies in the old “work smarter, not harder” paradigm. Blind diligence and the perseverance to keep on plugging away has little chance to push the boundaries of what is known to be possible.


Curious about any of the calculations discussed above? Tinker with my notes in this Jupyter notebook