A skeptic over coffee: sick of lab meetings

rhinovirus

This post brought to you by a dedicated community of human Rhinovirus ( pdb model 1AYM).

Imagine the following dialogue between researchers:

Wayne the Brain: “Third one this week ::Cough:: I am literally sick of lab meetings.”
Wankdorf: “Oh I feel ya. There are way too many lab meetings. It’s a real waste of time, but that’s the cost of pulling from so many different realms of expertise in interdisciplinary projects.”
Wayne the Brain: “No no no, I am literally sick of lab meetings. All the exposure is really taking a toll on my health. ”
Wankdorf: “Why didn’t you say so?! Stay away, you purveyor of vile pestilence! ::cough::”

I hope, dear reader, that you spotted the root cause of their misunderstanding. Wayne (the Brain) was hypothesizing a suspected transmission rate while simultaneously advertising his own condition as definitely infected and possibly contagious. Wankdorf (unsurprisingly) misinterprets the statement by applying a more colloquial definition of the term “literally.” It’s not clear whether infection of the second researcher could have been avoided and the spread of the disease slowed had they practised more effective communication, but that scenario is plausible given what we know.

Of course this is an extreme example, and the consequences may not always be so dire. The most frustrating part of the above exchange and subsequent misunderstanding is that neither participant was strictly wrong in the definition they assumed for “literally.” This word now literally can be used to say “in the truest sense of the words” and the exact opposite, and my brain literally imploded when I learned about the new definition.

If you don’t believe me, check out the definition in both the Cambridge and Merriam-Webster online dictionaries. I’ve screenshotted the definitions to preserve this embarrassment for posterity:

merriamwebsterliterally

cambridgeliterally

Language is dynamic, some (Wankdorf etc.,) would even say that it is dynamical. Hence it doesn’t make you appear smarter to bore your friends by talking about Romans every time they say “decimate.” Language is constantly changing in response to the selective pressures of popular usage, subject to many factors as people and cultures interact.

Similar to many other examples of evolution, humans affect the way a language changes by taking note of and modifying the selective pressures they individually exert. The consequences may be particularly important in science, where English is the common tongue but not in general the first language of most practitioners. I expect that modern English will evolve to encompass multiple forms based on usage. Native speakers sat on the British Isles, laying in North America, and so on will continue to retain and invent complexity and idiosyncrasy, while international English will come to resemble a utilitarian version of Up-Goer Five English, paring off superfluous complexities while retaining the most effective elements to become as simple as possible, but no simpler. It’s possible that international English will even retain sarcasm.

Pop quiz: what’s your favourite English speaker idiosyncrasies used in this article?

Return to the Play Area

When life gives you lemons, go exploring

xkcd is celebrating Randall Munroes new book with a browser game. It doesn’t take long to guess that there’s a lot more than the small coin-gathering area the player starts out in.

After escaping the small coin maze and vaulting a barrier wall on the left. . .


Eventually the first foray of our brave protagonist came to an end when I got stuck in a statue of George Washington.

Why it always pays (95% C.I.) to think twice about your statistics

IMG_20141208_191145

The northern hemisphere has just about reached its maximum tilt away from the sun, which means many academics will soon get a few days or weeks off to . . . revise statistics! Winter holidays are the perfect time to sit back, relax, take a fresh introspective at the research you may have been doing (and that which you haven’t) and catch up on all that work you were too distracted by work to do. It is a great time to think about the statistical methods in common use in your field and what they actually mean about the claims being made. Perhaps an unusual dedication to statistical rigour will help you become a stellar researcher, a beacon to others in your discipline. Perhaps it will just turn you into a vengefully cynical reviewer. At the least it should help you to make a fool of yourself ever-so-slightly less often.

First test your humor (description follows in case you prefer a mundane account to a hilarious webcomic): http://xkcd.com/882/

In the piece linked above, Randall Munroe highlights the low threshold for reporting significant results in much of science (particularly biomedical research) and specifically the way these uncertain results are over and mis-reported in the lay press. The premise is that researchers perform experiments to determine whether jelly beans of 20 different colours have anything to do with acne. After setting their p-value threshold at 0.05, they find in one of the 20 experiments that there is a statistically significant association between green jelly beans and acne. I would consider the humour response to this webcomic a good first-hurdle metric if I were a PI interviewing applicants for new students/post-docs.

In Munroe’s comic, the assumption is that jelly beans never have anything to do with acne and that 100% of the statistically significant results are due to chance. Assuming that all of the other results were also reported in the literature somewhere (although not likely to be picked up by the sensationalist press), this would give the proportion of reported results that fail to reflect reality at an intuitive and moderately acceptable 0.05, or 5%.
Let us instead consider a slightly more lab-relevant version:

Consider a situation where some jelly beans do have some relationship to the medical condition of interest, say 1 in 100 jelly bean variants are actually associated in some way with acne. Let us also swap small molecules for jelly beans, and cancer for acne, and use the same p-value threshold of 0.05. We are unlikely to report negative results where the small molecule has no relationship to the condition. We test 10000 different compounds for some change in a cancer phenotype in vitro.

Physicists may generally wait for 3-6 sigmas of significance before scheduling a press release, but for biologists publishing papers the typical p-value threshold is 0.05. If we use this threshold and perform our experiment and go directly to press with the statistically significant results of the experiment, 83.9% of our reported positive findings will be wrong. In the press, a 0.05 p-value will often be interpreted as “only 5% chance of being wrong.” This is certainly not what we see here, but after some thought the error rate is expected and fairly intuitive. Allow me to illustrate with numbers.

As expected from the conditions of the thought experiment 1%, or 100 compounds, of these have a real effect. Setting our p-value at the widely accepted 0.05, we will also uncover purely by chance non-existent relationships between 495 (0.05 * 99000 with no effect) of the compounds and our cancer phenotype of interest. If we assume that the probability of failing to detect a real effect due to chance are complementary to detecting a fake effect, we will pick up 95 of the 100 actual cases we are interested in. Our total positive results will be 495 + 95 = 590, but only 95 of those reflect a real association. 495/590, or about 83.9%, will be false positives.

Such is the premise of a short and interesting write-up by David Calquhoun on false discovery rates [2]. The emphasis is on biological research because that is where the problem is most visible, but the considerations discussed should be of interest to anyone conducting research. On the other hand, let us remember that confidence due to technical replicates does not generally translate to confidence in a description of reality, e.g. the statistical confidence in the data from the now-infamous faster-than-light neutrinos from the OPERA detector (http://arxiv.org/pdf/1109.4897v4.pdf) was very high, but the source of the anomaly was instrumentation and two top figures from the project eventually resigned after overzealous press coverage pushed the experiment into the limelight. Paul Blainey et al. discuss the importance of considering the effect of technical and biological (or more generally, experimentally relevant) replicates in a recent Nature Methods commentary [3].

I hope the above illustrates my thought that a conscientious awareness of the common pitfalls in one’s own field, as well as those one closely interacts, is important for slogging through the avalanche of results published every day and for producing brilliant work of one’s own. This requires continued effort in addition to an early general study of statistics, but I would suggest it is worth it. To quote [2] “In order to avoid making a fool of yourself you need to know how often you are right when you declare a result to be significant, and how often you are wrong.”

Reading:

[1]Munroe, Randall. Significant. XKCD. http://xkcd.com/882/

[2] Colquhoun, David. An investigation of the false discovery rate and the misinterpretation of p-values. DOI: 10.1098/rsos.140216. Royal Society Open Science. Published 19 November 2014. http://rsos.royalsocietypublishing.org/content/1/3/140216

[3] Blainey, Paul, Krzywinski, Martin, Altman, Naomi. Points of Significance: Replication. Nat Meth (2014) 11.9 879-880. http://dx.doi.org/10.1038/nmeth.3091