Rendering Deepmind’s Predicted Protein Structures for Novel Coronavirus 2019

I visualized predicted protein structures from the novel 2019 coronavirus. The structures are the latest from Deepmind’s AlphaFold, the champion entry in the CASP13 protein structure prediction competition that took place in 2018. They’ve reportedly continued to make improvements since then (perhaps hoping to make a similar showing at the next CASP spinning up this year), and there are open source implementations here and here (official), though I haven’t looked into the code yet. I’ve put together some notes on the putative functions of each protein described on the SWISS-MODEL site, which accompany the animated structure visualizations below. The gif files are each a few tens of Mb and so may take some time to load. If you’d prefer to look at the structures rendered as stereo anaglyphs (i.e. best-viewed with red/blue 3D glasses), click here.

I used PyMOL to render the predicted structures and build the animations in this post. PyMOL is open source software, and it’s pretty great. If you are a protein structure enthusiast, want to use PyMOL, and can afford to buy a license there is an incentive version that supports the maintenance of the open source project and ensures you always have the latest, greatest version of the program to work with.

Membrane protein (M protein).

This membrane protein is a part of the viral envelope. Via interactions with other proteins it plays an important role in forming the viral particle, but pre-existing template models of this protein are of low quality.

Non-structural protein 6 (Nsp6)

Nsp6 seems to play a role in inducing the host cell to make autophagosomes in order to deliver viral particles to lysosomes. Subverting the autophagsomal and lysosomal machinery of the host cell is a part of the maturation cycle for several different types of viruses. Low quality models of Nsp6 fragments were generated from homology modeling available on the SWISS-MODEL website

Non-structural protein 2 (Nsp2)

The function of Nsp2 isn’t fully determined yet, but it may have something to do with depressing host immune response and cell survival by interacting with Prohibitin 1 and 2 (PHB and PHB2). Prohibitin proteins have been implicated as receptors for chikungunya, and dengue fever virus particles.

Protein 3a

A little more is known about Protein 3a. The protein forms a tetrameric sodium channel that may be important for mediating the release of viral particles. Like the other proteins targeted by Deepmind’s AlphaFold team, this protein doesn’t have good sequence homologues and so had been limited to only a partial, low quality structure prediction.

Papain-like protease (PL-PRO), c-terminal section.

PL-PRO is a protease, which as the name suggests means it makes cuts in other protein. From the name, papain is a protease family named for the protease found in papaya. This one is responsible for processing viral RNA replicase by making a pair of cuts near the N-terminus of one of the peptides that make up the viral replicase. It also is associated with making membrane vesicles important for viral replication, along with Nsp4.

Non-structural protein 4 (Nsp4)

Nsp4 plays a part, along with PL-PRO, in the production of vesicles required for viral replication. A pre-existing homology template based model of the C-terminus of Nsp4 bears a close resemblance to the AlphaFold prediction, at least superficially. A comparison of template-based model YP_009725300.1 model 1 and the AlphaFold prediction is shown below.

Comparison of AlphaFold prediction and template model prediction (in blue call-out box). The template model is considered to be reasonably good quality.

The predicted structures released by Deepmind come with a grain of salt which I’ll reiterate here. The structures are predicted (not experimental) so they may differ quite a bit from their native forms. Deepmind has made the structural estimates available under a CC BY 4.0 license (the citation is at the end of the post), and I’ll maintain the visualizations under the same license: feel free to use them with attribution.

There’s obviously a lot going on with the current coronavirus pandemic, so I won’t repeat the information about hand washing, social distancing, or hiding out in the woods that you’ve probably already read about. If you’re interested in learning more about protein structure prediction you can start with the Wikipedia entry and/or the introduction course on the SWISS-MODEL website. The Levinthal’s paradox is also a fun thought experiment for framing the problem and it’s inherent difficulty. Mohammed AlQuraishi wrote an insightful recap of AlphaFold at CASP13.

There is a tremendous amount of research effort currently dedicated to studying the 2019 novel coronavirus, including several structural modelling projects. If you don’t want to dive into the rabbit hole vortex of computational protein structure prediction but still want to do something combining protein structure and the COVID-19 virus, Folding@Home and Foldit both have projects related to the new coronavirus. You can help by donating some of your idle computer resources to simulate structural dynamics with Folding@Home or you can work at solving structural puzzles Foldit.

[1] John Jumper, Kathryn Tunyasuvunakool, Pushmeet Kohli, Demis Hassabis, and the AlphaFold Team, “Computational predictions of protein structures associated with COVID-19”, DeepMind website, 5 March 2020, https://deepmind.com/research/open-source/computational-predictions-of-protein-structures-associated-with-COVID-19

[2] SWISS-MODEL Coronavirus template structure predictions page https://swissmodel.expasy.org/repository/species/2697049

[3] PyMOL. Supported, incentive version. https://pymol.org/2/ Open source project: https://github.com/schrodinger/pymol-open-source

Treating TensorFlow APIs Like a Genetics Experiment to Investigate MLP Performance Variations

I built two six-layer MLPs at different levels of abstraction: a lower-level MLP using explicit matrix multiplication and activation, and a higher-level MLP using tf.layers and tf.contrib.learn. Although my intention was simply to practice implementing simple MLPs at different levels of abstraction, and despite using the same optimizer and same architecture for training, the higher-level abstracted model performed much better (often achieving 100% accuracy on the validation datasets) than the model built around tf.matmul operations. That sort of mystery deserves an investigation, and I set out to find out what was leading to the performance difference and built two more models mixing tf.layers, tf.contrib.learn, and tf.matmul. I used the iris, wine, and digits datasets from scikit-learn as these are small enough to iterate over a lot of variations without taking too much time.

In genetics research it’s common practice to determine relationships between genes and traits by breaking things until the trait disappears, than trying to restore the trait by externally adding specific genes back to compensate for the broken one. These perturbations are called “knockout” and “rescue,” respectively, and I took a similar approach here. My main findings were:

  • Replacing tf.matmul operations with tf.layers didn’t have much effect. Changing dropout and other hyperparameters did not seem to effect the low-level and high-level models differently.
  • “Knocking out” the use of learn.Estimator.fit from tf.contrib.learnand running the training optimizer directly led to significantly degraded performance of the tf.layers model.
  • The model built around tf.matmul could be “rescued” by training with learn.Estimator.fitinstead of train_op.run.
  • The higher-level model using layers did generally perform a little better than the lower-level model, especially on the digits dataset.

So we can conclude that training with the tf.estimator API was likely responsible for the higher performance from the more abstracted model. Cross-validation curves demonstrating the training efficacy of the different models are shown below:

Cross-validation accuracy curves for different random seeds using the tf.layers model.

Cross-validation accuracy curves for different random seeds using the tf.matmul model.

These MLPs perform pretty well (and converge in just a few minutes) on the small sklearn datasets. The four models are built to be readily modifiable and iterable, and can be accessed from the Git repository

Decomposing Autoencoder Conv-net Outputs with Wavelets

Replacing a bespoke image segmentation workflow using classical computer vision tasks with a simple, fully convolutional neural network isn’t too hard with modern compute and software libraries, at least not for the first part of the learning curve. The conv-net alleviates your fine-tuning overhead, decreases the total curation requirement (time spent correcting human-obvious mistakes), and it even expands the flexibility of your segmentations so that you can simultaneously identify the pixel locations of multiple different classes. Even if the model occasionally makes mistakes, it seems to do so in a way that makes it obvious what the net was “thinking,” and the mistakes are still pretty close. If this is so easy, why do we still even have humans?

In some ways conv-nets work almost too well for many computer vision tasks. Getting a reasonably good result and declaring it “good enough” is very tempting. It’s easy to get lackadaisical about a task that you wouldn’t even approach for automation a decade ago, leaving it to undergraduates[1] to manually assess images for “research experience” like focused zipheads[2]. But we can do better, and it’s important that we do so if we are to live in a desirable future. Biased algorithms are nothing new, and the ramifications of a misbehaving model remain the responsibility of its creators[3]

Take a 4 layer CNN trained to segment mitochondria from electron micrographs of brain tissue (training on an electron microscopy dataset from EPFL here. On a scale from Loch Stenness to Loch Ness, the depth of this network is the Bonneville Salt Flats. Nonetheless this puddle of neurons manages to get a reasonably good result after only a few hundred epochs.

I don’t think it would take too much in the way of post-processing to clean those segmentation results: a closing operator to get rid of the erroneous spots and smooth a few artifacts. But isn’t that defeating the point? The ease of getting good results gained early can be a bit misleading. Getting to 90% or even 95% effectiveness on a task can seem pretty easy thanks to the impressive learning capacity of conv-nets, but closing the gap of the last few percent, building a model that generalizes to new datasets, or better yet, transfers what it has learned to largely different tasks is much more difficult. With all the accelerated hardware and improved software libraries we have available today you may be only 30 minutes away from a perfect cat classifier, but you’re at least a few months of diligent work away from a conv-net that can match the image analysis efficacy of an undergrad for a new project.

Pooling operations are often touted as a principal contributor to conv-net classifier invariance, but this is controversial, and in any case most people who can afford the hardware for memory-intensive models are leaving them behind. It seems that pooling is probably more important for regularization than for feature invariance, but we’ll leave that discussion for another time. One side effect of pooling operations is that images are blurred as the x/y dimensions are reduced in deeper layers.

U-Net architectures and atrous convolutions are two strategies that have lately been shown to be effective elements of image segmentation models. The assumed effect for both strategies is better retention of high frequency details (as compared to fully convolutional networks). These counteract some of the blurring effect that comes from using pooling layers.

In this post, we’ll compare the frequency content retained in the output from different models. The training data is EM data from brain slices like the example above. I’m using the dataset from the 2012 ISBI 2D EM segmentation challenge for training and validation (published by Cardona et al., and we’ll compare the results using the EPFL dataset mentioned above as a test set.

To examine how these elements contribute to a vision model, we’ll train them on EM data as autoencoders. I’ve built one model for each strategy, constrained to have the same number of weights. The training process looks something like this (in the case of the fully convolutional model):

Dilated convolutions are an old concept revitalized to address problems associated with details lost to pooling operations by making them optional. This is accomplished by using dilated convolutional kernels (spacing the weights with zeros, or holes) to achieve long-distance context without pooling. In the image below, the dark squares are the active weights while the light gray ones are the “holes” (i.e. in French atrous). Where these kernels are convolved with a layer, they act like a larger kernel without having to learn/store additional weights.

U-Net architectures, on the other hand, utilize skip connections to bring information from the early, less-pooled layers to later layers. The main risk I see in using U-Net architectures is that for a particularly deep model the network may develop an over-reliance on the skip connections. This would mean the very early layers will train faster and have a bigger influence on the model, losing out on the capacity for more abstract feature representations in the layers at the bottom of the “U”.

Using atrous convolutions makes for noticeably better autoencoding fidelity compared to a simple fully convolutional network:

While training with the UNet architecture produces images that are hardly discernible from the originals. Note that the images here are from the validation set, they aren’t seen by the model during training steps.

If you compare the results qualitatively, the U-Net architecture is a clear winner in terms of the sharpness of the decoded output. By the looks of it the U-Net is probably more susceptible to fitting noise as well, at least in this configuration. Using dilated convolutions also offers improved detail reconstruction compared to the fully convolutional network, but it does eat up more memory and trains more slowly due to the wide interior layers.

This seemed like a good opportunity to bring out wavelet analysis to quantify the differences in autoencoder output. We’ll use wavelet image decomposition to investigate which frequency levels are most prevalent in the decoded output from each model. Image decomposition with wavelets looks something like this:

The top-left image has been downsized 2x from the original by removing the details with a wavelet transform (using Daubechies 1). The details left over in the other quadrants correspond to the high frequency content oriented to the vertical, horizontal, and diagonal directions. By computing wavelet decompositions of the conv-net outputs and comparing the normalized sums at each level, we should be able to get a good idea of where the information of the image resides. You can get an impression of the first level of wavelet decomposition for output images from the various models in the examples below:

And finally, if we calculate the normalized power for each level of wavelet decomposition we can see where the majority of the information of the corresponding image resides. The metrics below are the average of 100 autoencoded images from the test dataset.

In the plot, spatial frequencies increase with decreasing levels from left to right. Level 8 refers to the 8th level of the wavelet decomposition, aka the average gray level in this case. The model using a U-Net architecture is the closest to recapitulating all the spatial frequencies of the original image, with the noticeable exception of an about 60% decrease in image intensity at the very highest spatial frequencies.

I’d say the difference between the U-Net output and the original image is mostly down to a denoising effect. The atrous conv-net is not too far behind the U-Net in terms of spatial frequency fidelity, and the choice of model variant probably would depend on the end use. For example, there are some very small sub-organellar dot features that are resolved in the U-Net reconstruction but not the atrous model. If we wanted to segment those features, we’d definitely choose the U-Net. On the other hand, the atrous net would probably suffer less from over-fitting if we wanted to train for segmenting the larger mitochondria and only have a small dataset to train on. Finally, if all we want is to coarsely identify the cellular boundaries, that’s basically what we see in the autoencoder output from the fully convolutional network.

Hopefully this has been a helpful exercise in examining conv-net capabilities in a simple example. Open questions for this set of models remain. Which model performs the best on an actual semantic segmentation task? Does the U-Net rely too much on the skip connections?

I’m working with these models in a repository where I plan to keep notes and code for experimenting with ideas from the machine learning literature and you’re welcome to use the models therein for your own experiments.

Datasets from:

A. Lucchi, K.Smith, R. Achanta, G. Knott, P. Fua, Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks with Learned Shape Features, IEEE Transactions on Medical Imaging, Vol. 30, Nr. 11, October 2011.

Albert Cardona, Stephan Saalfeld, Stephan Preibisch, Benjamin Schmid, Anchi Cheng, Jim Pulokas, Pavel Tomancak, Volker Hartenstein. An Integrated Micro- and Macroarchitectural Analysis of the Drosophila Brain by Computer-Assisted Serial Section Electron Microscopy. PLOS 2010

Zebra: https://commons.wikimedia.org/wiki/Zebra#/media/File:Three_Zebras_Drinking.jpg

Relevant articles:

Olaf Ronneberger, Philipp Fischer, Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. Arxiv. https://arxiv.org/abs/1505.04597

Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. Rethinking Atrous Convolution for Semantic Image Segmentation. Arxiv. https://arxiv.org/abs/1706.05587

[1] My first job in a research laboratory was to dig through soil samples with fine tweezers to remove roots. We don’t have robots to do this (yet) but I can’t imagine a bored undergraduate producing replicable results in this scenario, and the same goes for manual image segmenation or assessment. On the other hand the undergrad will probably give the best results, albeit with a high standard deviation, as they are likely to have the most ambiguous understanding of the professor’s hypothesis and desired results of anyone in the lab.

[2] I am indeed reading A Deepness in the Sky.

[3] (o_o) / (^_^) / (*~*)

Journalistic Phylogeny of the Silicon Valley Apocalypse

For some reason, doomsday mania is totally in this season.

In 2014 I talked about the tendency of internet writers to regurgitate the press release for trendy science news. The direct lineage from press release to press coverage makes it easy for writers to phone it in: university press offices essentially hand out pre-written sensationalist versions of recent publications. It’s not surprising that with so much of the resulting material in circulation taking text verbatim from the same origin, it is possible to visualize the similarities as genetic sequences in a phylogenetic tree.

Recently the same sort of journalistic laziness reared its head as stories about the luxury doomsday prepper market. Evan Osnos at The New Yorker wrote an article describing the trend in Silicon Valley to buy up bunkers, bullets, and body armor-they think we’ll all soon rise up against them following the advent of A.I. Without a press release to serve as a ready-made template, other outlets turned to reporting on the New Yorker story itself as if it were a primary source. This is a bit different than copying down the press release as your own, and the inheritance is not as direct. If anything, this practice is even more hackneyed. At least a press office puts out their releases with the intention that the text serves as material for coverage so that the topic gets as much circulation as possible. Covering another story as a primary source, rather than writing an original commentary or rebuttal, is just a way to skim traffic off a trend.

In any case, I decided to subject this batch of articles to my previous workflow: converting the text to a DNA sequence with DNA writer by Lensyl Urbano, aligning the sequences with MAFFT and/or T-Coffee Expresso, and using the distances from the alignment to make a tree in Phyl.io. Here’s the result:

svatree

Heredity isn’t as clear-cut as it was when I looked at science articles: there’s more remixing in this case and we see that in increased branch distances from the New Yorker article to most of the others. Interestingly, there are a few articles that are quite close to each other, much more so than they are to the New Yorker article. Perhaps this rabbit hole of quasi-plagiarism is even deeper than it first appears, with one article covering another article about an article about an article. . .

In any case, now that I’ve gone through this workflow twice, the next time I’ll be obligated to automate the whole thing in Python.

You can tinker with the MAFFT alignment, at least for a while, here:
http://mafft.cbrc.jp/alignment/server/spool/_out1701310631s24824093CAxLP69W2ZebokqEy0TuG.html

My tree:
((((((((((((1_bizJournals:0.65712,(3_newYorker:0.44428,13_breitbart:0.44428):0.21284):0.11522,10_vanityFair:0.77234):0.04207,6_offTheGridNews:0.8441):0.05849,17_EdgyLabs:0.87290):0.04449,14_cnbc_:0.91739):0.02664,2_guardian:0.94403):0.02047,16_RecodeDotNet:0.96451):0.02541,(7_qzDotCom:0.95494,15_npr:0.95494):0.03498):0.00361,8_theIETdotCom:0.99353):0.01310,18_PedestrianDotTV:1.00664:0.03785,((9_ukBusinessInsider:0.06443,12_yahoo:0.06443):0.96008,19_sundayMorningHerald:1.02451):0.01997):0.00953,11_wiredGoogleCatsOUTGROUP3:1.05401)

Sources:

https://www.theguardian.com/technology/2017/jan/29/silicon-valley-new-zealand-apocalypse-escape
http://uk.businessinsider.com/silicon-valley-billionaires-apocalypse-preppers-2017-1?r=US&IR=T
http://www.vanityfair.com/news/2017/01/silicon-valley-is-preparing-for-the-apocalypse
http://www.bizjournals.com/sanjose/news/2017/01/24/apocalypse-now-silicon-valley-elite-says-theyre.html
http://www.newyorker.com/magazine/2017/01/30/doomsday-prep-for-the-super-rich

https://finance.yahoo.com/news/silicon-valley-billionaires-preparing-apocalypse-202000443.html

https://eandt.theiet.org/content/articles/2017/01/apocalypse-2017-silicon-valley-and-beyond-worried-about-the-end-of-the-world/
http://www.offthegridnews.com/extreme-survival/50-percent-of-silicon-valley-billionaires-are-prepping-for-the-apocalypse/
https://qz.com/892543/apocalypse-insurance-reddits-ceo-venture-capitalists-and-others-in-silicon-valley-are-preparing-for-the-end-of-civilization/

https://www.wired.com/2012/06/google-x-neural-network/
a href=”
http://www.breitbart.com/tech/2017/01/24/silicon-valley-elites-privately-turning-into-doomsday-preppers/”>
http://www.breitbart.com/tech/2017/01/24/silicon-valley-elites-privately-turning-into-doomsday-preppers/
http://www.cnbc.com/2017/01/25/the-super-rich-are-preparing-for-the-end-of-the-world.html
http://www.npr.org/2017/01/25/511507434/why-some-silicon-valley-tech-executives-are-bunkering-down-for-doomsday
http://www.recode.net/2017/1/23/14354840/silicon-valley-billionaires-prepping-survive-underground-bunkers-new-yorker
https://edgylabs.com/2017/01/30/doomsday-prepping-silicon-valley/
https://www.pedestrian.tv/news/tech/silicon-valley-ceos-are-terrified-of-the-apocalyps/ba4c1c5d-f1c4-4fd7-8d32-77300637666e.htm
http://www.smh.com.au/business/world-business/rich-silicon-valley-doomsday-preppers-buying-up-new-zealand-land-20170124-gty353.html

A mimic without a model

Macroglossum stellatarum looks and behaves remarkably like a hummingbird, albeit without the “swordfighting” and high-pitched battle cries of their avian lookalikes. Selective advantage of mimicry is obvious in situations where the imitated organism is less palatable or more dangerous, or when said mimicry furthers the mimic’s own life cycle, but what if the apparent object of imitation is no longer found in the mimic’s range? Such is the case with the European hummingbird hawk-moth, which confuses birders in northern Europe in late summer. Hummingbirds are a purely New World group of birds, so what exactly are the European hummingbird moths gaining from mimicking a non-existent group of birds or, on the other hand, when is a mimic not a mimic?

If you ask your hipster friends you are sure to receive an explanation for why partaking in a trend can be a truly novel act, owing to some small esoteric twist or another. Macroglossum in the Old World may have undergone a mutual convergent co-evolution, rather than outright mimicry as the common name for these insects might suggest. Fossils of largely modern hummingbirds in Europe have been described dating to the Oligocene (about 30 million years ago). Add to that the apparent evolutionary footprint of significant pollination by hummingbirds seen in a number of Old World flowers, and it begins to look plausible that Macroglossum and other Old World humming-moths settled into a niche of pollinating long-stemmed, nectar-heavy and perch-free flowers, alongside but not dependent on hummingbirds. If your mouth-part is longer than the rest of your body combined you might as well use it, whether or not a hummingbirds are currently trending in your area. A 30 cm proboscis never goes out of style*.

This digression was kindled by a few sunny afternoons spent in the company of beautiful hawk-moths in the Tuscan hills.

DSC_0771

DSC_0728

*I’m not an expert in long-tongued pollinators, and it’s not clear to me how much of a role mimicry and convergent evolution both may have played in European hummingbird moths. The extremely long proboscis of Xanthophan morgani and the correspondingly deep nectar placement of Angraecum sesquipedale is a famous example of co-evolution that had a strong impact on Darwin’s thinking. To my knowledge, there has never been a hummingbird with a 35 cm tongue, and in many ways hawk-moths may have pioneered the lifestyle of deep-seated nectarivory, before it was cool. The fossil records for hummingbirds and hawk-moths alike are rather spotty.

NHM_Xanthopan_morgani

Xanthophan morgani with extended proboscis from the London Museum of Natural History, by Wikipedia user Esculapio

Equally remarkable is the visual system. Behold those pseudopupils!

DSC_0778PP

DSC_0728_PP

DSC_0771_PP

Transhelminthism:

If you want to find out if a digital nematode is alive, try asking it.

Fancy living in a computer? Contributors to the OpenWorm project aim to make life inside a computer a (virtual) reality. In recent years, various brain projects have focused funding on moonshot science initiatives to map, model and ultimately understand the human brain: the computer that helps humans to cognito that they sum. These are similar in feel to the human genome project of the late 1990s and early 2000s. Despite the inherent contradictions of the oft-trotted trope that the human brain is the “most complex thing in the universe,” it is indeed quite a complicated machine, decidedly more complex than the human genome. Understanding how it works will take more than mapping every connection, which is akin to knowing every node in a circuit but having no idea what each component is. A multivalent approach at the levels of cells, circuits, connections, and mind offers the most complete picture. OpenWorm coordinator Stephen Larson et al. aim to start by understanding something a little bit simpler: the determinate 304 neuron brain and accompanying body of Caenorhabditis elegans, a soil-dwelling nematode worm that has served as a workhorse in biology for decades.

Genome, Brain

The connectome, a neural wiring diagram of the worm’s brain, has been mapped. The simulation of the worm at the cellular level is an ongoing open-source software program. The first human genome was sequenced only 3 years after the first C. elegans genome, a similar pace for full biological simulation in silico would mean that digital humans, or a reasonable facsimile, are possible within our lifetimes. At the point when these simulations of people are able to fool observers will these entities be alive and conscious? Have rights? Pay taxes? If a digital person claims the validity of their own consciousness should we take their word for it, or determine some metric for ascertaining the consciousness of a simulated person based on our own inspection? For answers to questions of existence and sapience we can turn to our own experience (believing as we do that we are conscious entities), and the venerable history of the questions as discussed in science fiction.

Conversation with the chatbot (a conversational precursor to intelligent software)CleverBot from 2014 December 24.

In the so-called golden age of science fiction characters tended to be smart, talented, and capable. Aside from an unnerving lack of faults and weakness, overall the protagonists were fundamentally human. The main difference between the audience and the actors in these stories was access to better technology. But it may be that this vision of a human future is comically (tragically?) myopic. Even our biology has been changing more quickly as civilisation and technologies develop. If we add a rate of technological advance that challenges the best-educated humans to keep pace, a speed-up of the rate of change in average meteorological variables, and human-driven selective pressure, the next century should be interesting to say the least. When those unobtainyl transferase pills for longevity finally kick in, generational turnover can no longer be counted on to ease adaptation to a step-change in civilisation.

Greg Egan (who may or may not be a computer program) has been writing about software-based people for over two decades. When the mind of a human is not limited to run on a single instance of its native hardware, new concepts such as “local death” and traveling by transmission emerge intrinsically. Most of the characters in novels from writers such as Egan waste little time questioning whether they will still exist if they have to resort to a backup copy of themselves. As in flesh-and-blood humans, persistence of memory plays a key role in the sense of self, but is not nearly so limited. If a software person splits themselves to pursue two avenues of interest, they may combine their experiences upon their reunion, rejoining as a single instance with a transiently bifurcated path. If the two instances of a single person disagree as to their sameness, they may decide to go on as two different people. These simulated people would be unlikely to care (beyond their inevitable battle for civil rights) whether you consider them to be alive and sapient or not, any more so than the reader is likely to disbelieve their own sapience.

Many of the thought experiments associated with software-based person-hood are prompted by a human perception of dubiousness in duplicity: two instances of a person existing at the same time, but not sharing a single experience, don’t feel like the same person. Perhaps as the OpenWorm project develops we can watch carefully for signs of animosity and existential crises among a population of digital C. elegans twinned from the same starting material. We (or our impostorous digital doppelgängers, depending on your perspective) may find out for ourselves what this feels like sooner than we think.

2014-12-29 – Leading comic edited for improved comedic effect

Why it always pays (95% C.I.) to think twice about your statistics

IMG_20141208_191145

The northern hemisphere has just about reached its maximum tilt away from the sun, which means many academics will soon get a few days or weeks off to . . . revise statistics! Winter holidays are the perfect time to sit back, relax, take a fresh introspective at the research you may have been doing (and that which you haven’t) and catch up on all that work you were too distracted by work to do. It is a great time to think about the statistical methods in common use in your field and what they actually mean about the claims being made. Perhaps an unusual dedication to statistical rigour will help you become a stellar researcher, a beacon to others in your discipline. Perhaps it will just turn you into a vengefully cynical reviewer. At the least it should help you to make a fool of yourself ever-so-slightly less often.

First test your humor (description follows in case you prefer a mundane account to a hilarious webcomic): http://xkcd.com/882/

In the piece linked above, Randall Munroe highlights the low threshold for reporting significant results in much of science (particularly biomedical research) and specifically the way these uncertain results are over and mis-reported in the lay press. The premise is that researchers perform experiments to determine whether jelly beans of 20 different colours have anything to do with acne. After setting their p-value threshold at 0.05, they find in one of the 20 experiments that there is a statistically significant association between green jelly beans and acne. I would consider the humour response to this webcomic a good first-hurdle metric if I were a PI interviewing applicants for new students/post-docs.

In Munroe’s comic, the assumption is that jelly beans never have anything to do with acne and that 100% of the statistically significant results are due to chance. Assuming that all of the other results were also reported in the literature somewhere (although not likely to be picked up by the sensationalist press), this would give the proportion of reported results that fail to reflect reality at an intuitive and moderately acceptable 0.05, or 5%.
Let us instead consider a slightly more lab-relevant version:

Consider a situation where some jelly beans do have some relationship to the medical condition of interest, say 1 in 100 jelly bean variants are actually associated in some way with acne. Let us also swap small molecules for jelly beans, and cancer for acne, and use the same p-value threshold of 0.05. We are unlikely to report negative results where the small molecule has no relationship to the condition. We test 10000 different compounds for some change in a cancer phenotype in vitro.

Physicists may generally wait for 3-6 sigmas of significance before scheduling a press release, but for biologists publishing papers the typical p-value threshold is 0.05. If we use this threshold and perform our experiment and go directly to press with the statistically significant results of the experiment, 83.9% of our reported positive findings will be wrong. In the press, a 0.05 p-value will often be interpreted as “only 5% chance of being wrong.” This is certainly not what we see here, but after some thought the error rate is expected and fairly intuitive. Allow me to illustrate with numbers.

As expected from the conditions of the thought experiment 1%, or 100 compounds, of these have a real effect. Setting our p-value at the widely accepted 0.05, we will also uncover purely by chance non-existent relationships between 495 (0.05 * 99000 with no effect) of the compounds and our cancer phenotype of interest. If we assume that the probability of failing to detect a real effect due to chance are complementary to detecting a fake effect, we will pick up 95 of the 100 actual cases we are interested in. Our total positive results will be 495 + 95 = 590, but only 95 of those reflect a real association. 495/590, or about 83.9%, will be false positives.

Such is the premise of a short and interesting write-up by David Calquhoun on false discovery rates [2]. The emphasis is on biological research because that is where the problem is most visible, but the considerations discussed should be of interest to anyone conducting research. On the other hand, let us remember that confidence due to technical replicates does not generally translate to confidence in a description of reality, e.g. the statistical confidence in the data from the now-infamous faster-than-light neutrinos from the OPERA detector (http://arxiv.org/pdf/1109.4897v4.pdf) was very high, but the source of the anomaly was instrumentation and two top figures from the project eventually resigned after overzealous press coverage pushed the experiment into the limelight. Paul Blainey et al. discuss the importance of considering the effect of technical and biological (or more generally, experimentally relevant) replicates in a recent Nature Methods commentary [3].

I hope the above illustrates my thought that a conscientious awareness of the common pitfalls in one’s own field, as well as those one closely interacts, is important for slogging through the avalanche of results published every day and for producing brilliant work of one’s own. This requires continued effort in addition to an early general study of statistics, but I would suggest it is worth it. To quote [2] “In order to avoid making a fool of yourself you need to know how often you are right when you declare a result to be significant, and how often you are wrong.”

Reading:

[1]Munroe, Randall. Significant. XKCD. http://xkcd.com/882/

[2] Colquhoun, David. An investigation of the false discovery rate and the misinterpretation of p-values. DOI: 10.1098/rsos.140216. Royal Society Open Science. Published 19 November 2014. http://rsos.royalsocietypublishing.org/content/1/3/140216

[3] Blainey, Paul, Krzywinski, Martin, Altman, Naomi. Points of Significance: Replication. Nat Meth (2014) 11.9 879-880. http://dx.doi.org/10.1038/nmeth.3091

Philaephilia?

Philaephilia n. Temporary obsession with logistically important and risky stage of scientific endeavour and cometary rendezvous.

Don’t worry, the condition is entirely transient

Rivalling the 7 minutes of terror as NASA’s Curiosity rover entered the Martian atmosphere, Philae’s descent onto comet 67P/Churyumov-Gerasimenko Wednesday as part of the European Space Agency’s Rosetta mission had the world excited about space again.

Comets don’t have the classic appeal of planets like Mars. The high visibility of Mars missions and moon shots has roots in visions of a Mars covered in seasonal vegetation and full of sexy humans dressed in scraps of leather, and little else. But comets may be much better targets in terms of the scientific benefits. Comets are thought to have added water to early Earth, after the young sun had blasted the substance out to the far reaches of the solar system beyond the realm of the rocky planets. Of course, comets are also of interest for pure novelty: until Philae, humans had never put a machine down on a comet gently. Now the feat has been accomplished three times, albeit a bit awkwardly, with all science instruments surviving two slow bounces and an unplanned landing site. Unfortunate that Philae is limited to only 1.5 hours of sunlight per 12 hour day, but there is some possibility that a last-minute attitude adjustment may have arranged the solar panels a bit more fortuitously.

So if Rosetta’s Philae lander bounced twice, rather than grappling the surface as intended, and landed in a wayward orientation where its solar panels are limited to only 12.5% of nominal sun exposure, how is the mission considered a success?

Most likely, the full significance of the data relayed from Philae via Rosetta will take several months of analysis to uncover. Perhaps some of the experiments will be wholly inconclusive and observational, neither confirming nor denying hypotheses of characteristic structure of comets. For example, it seems unlikely that the MUPUS instrument (i.e. cosmic drill) managed to penetrate a meaningful distance into the comet, and we probably won’t gain much insight concerning the top layers of a comet beyond perhaps a centimetre or so. In contrast, CONSERT may yield unprecedented observations about the interior makeup of a comet.

In science, failures and negative findings are certainly more conclusive, and arguably more preferable, than so-called positive results, despite the selective pressure for the latter in science careers and the lay press. An exception disproves the rule, but a finding in agreement with theory merely “fails to negate” said theory. For example, we now know better than to use nitrocellulose as a vacuum propellant. Lesson learned on that front.

In addition to a something-divided-by-nothing fold increase in knowledge about the specific scenario of attempting a soft landing on a comet, I’d suggest we now know a bit more about the value of autonomy in expeditions where the beck-and-call from mission control to operations obviates real time feedback. Perhaps if Philae had been optimised for adaptability, it would have been able to maintain orientation to the comet surface and give Rosetta and scientists at home a better idea of its (final) resting place after detecting that the touchdown and grapple didn’t go through. Space science is necessarily cautious, but adaptive neural networks and other alternative avenues may prove useful in future missions.

I’ll eagerly await the aftermath, when the experimental and the telemetry data have been further analysed. The kind of space mission where a landing sequence can omit a major step and still have operational success of all scientific instruments on board is the kind of mission that space agencies should focus on. The Rosetta/Philae mission combined key elements of novelty (first soft landing and persistent orbiting of a comet) low cost (comparable to a fewspace shuttle missions), and robustness (grapples didn’t fire, comet bounced and got lost, science still occurred). Perhaps we’ll see continued ventures from international space agencies into novel, science-driven expeditions. Remember, the first scientist on the moon was on the (so far) final manned mission to Luna. Missions in the style of Rosetta may be more effective and valuable on all three of the above points, and are definitely more fundamental in terms of science achieved, than continuous returns to Mars and pushes for manned missions. In a perfect world where space agencies operate in a non-zero sum funding situation along with all the other major challenges faced by human society, we would pursue them all. But realistically, Philae has shown that not only do alternative missions potentially offer more for us to learn in terms ofscience and engineering, but can also enrapture the population in a transcendent endeavour. Don’t stop following the clever madness of humans pursuing their fundamental nature of exploring the universe they live in.