## Why is there no confidence in science journalism?

Living in the so-called anthropocene, meaningful participation in humanity’s trajectory requires scientific literacy. This requirement is a necessity at the population level, it is not enough for a small proportion of select individuals to develop this expertise, applying them only to the avenues of their own interest. Rather, a general understanding and use of the scientific method in forming actionable ideas for modern problems is a requisite for a public capable of steering policy along a survivable route. As an added benefit, scientific literacy produces a rarely avoided side-effect of knowing one or two things for certain, and touching upon the numinous of the universe.

Statistical literacy is a necessary foundation for building scientific literacy. Widespread confusion about the meaning of such terms as “statistical significance” (compounded by non-standard usage of the term “significance” on its own) abounds, resulting in little to no transferability of the import of these concepts when scientific results are described in mainstream publications. What’s worse, this results in a jaded public knowing just enough to twist the jargon of science to support their own predetermined, potentially dangerous, conclusions (e.g. because scientific theories can be refuted by evidence to the contrary, a given theory, no matter the level of support by existing data, can be ignored when forming personal and policy decisions).

I posit that a fair amount of the responsibility for improving the state of non-specialist scientific literacy lies with science journalists at all scales. The most popular science-branded media does little to nothing in imparting a sense of the scientific method, the context and contribution of published experiments, and the meaning of statistics underlying the claims. I suggest that a standardisation of language for describing scientific results is warranted, so that results and concepts can be communicated in an intuitive manner without resorting to condescension, as well as conferring the quantitative, comparable values used to form scientific conclusions.

A good place to start (though certainly not perfect) is the uncertainty guidance put out by the Intergovernmental Panel on Climate Change (IPCC). The IPCC reports benefit from translating statistical concepts of confidence and likelihood into intuitive terms without sacrificing the underlying quantitative meaning (mostly). In the IPCC AR5 report guidance on addressing uncertainty [pdf], likelihood statements of probability are standardised as follows:

In the fourth assessment report (AR4), the guidance [pdf] roughly calibrated confidence statements to a chance of being correct. I’ve written the guidance here in terms of p-values, or the chance that results are due to coincidence (p = 0.10 = 10% chance), but statistical tests producing other measurements of confidence were also covered.

The description of results via their confidence rather than statistical significance, which is normally used, is probably more intuitive to most people. Few people in general readership readily discern between statistical significance, i.e. the results are likely to not be due to chance, and meaningful significance, i.e. the results matter in some way. Likewise, statistical significance statements are not even very well established in scientific literature and vary widely by field. That being said, the IPCC’s AR4 guidance threshold for very high confidence is quite low. Many scientific results are only considered reportable at a p-value of less than 0.05, or 5% chance of being an experimental artifact in the data due to coincidence, whereas the AR4 guidance links a statement of very high confidence to anything with less than a 10% chance of being wrong. Likewise, a 5-in-10 chance of being correct hardly merits a statement of medium confidence in my opinion. Despite these limitations, I think the guidance should have been merely updated to better reflect the statistical reality of confidenceand it was a mistake for the guidance for AR5 to switch to purely qualitative standards for conveying confidence based on the table below, with highest confidence in the top right and lowest confidence in the bottom left.

Adoption (and adaptation) of standards like these in regular usage by journalist could do a lot to better the communication of science to a general readership. This would normalise field-variable technical jargon (e.g. sigma significance values in particle physics, p-values in biology) and reduce the need for daft analogies. Results described in this way would be amenable to meaningful comparison by generally interested but non-specialist audiences, while those with a little practice in statistics won’t be any less informed by dumbing-down the meaning.

Edited 2016/06/25 for a better title, added comic graphic. Source for file of cover design by Norman Saunders (Public Domain)
23 Aug. 2014: typo in first paragraph corrected:

. . . meaningful participation in participating in humanity’s trajectory. . .

References:

Michael D. Mastrandrea et al. Guidance Note for Lead Authors of the IPCC Fifth Assessment Report on Consistent Treatment of Uncertainties. IPCC Cross-Working Group Meeting on Consistent Treatment of Uncertainties. Jasper Ridge, CA, USA 6-7 July 2010. <http://www.ipcc.ch/pdf/supporting-material/uncertainty-guidance-note.pdf&gt;

IPCC. Guidance Notes for Lead Authors of the IPCC Fourth Assessment Report on Addressing Uncertainties. July 2005. <https://www.ipcc-wg1.unibe.ch/publications/supportingmaterial/uncertainty-guidance-note.pdf&gt;

## A Phylogeny of Internet Journalism

While reading press coverage on the UW-Madison primate caloric restriction study for my essay, I kept getting deja vu as I noticed I was coming across the same language over and over. Much of this was due to the heavy reliance of early coverage on the press release from the University of Wisconsin-Madison, and sites buying stories from each other,and I decided it might be informative to make a phylogenetic tree of the coverage. To do so I used the text from the first two pages of google news results for “wisconsin monkey caloric restriction” and built a phylogenetic tree based on multiple sequence alignment after converting the english text to DNA sequences. I found a total of 27 articles on the CR study, and included one unrelated outgroup for a total of 28.

I used DNA Writer by Lensyl Urbano (CC BY NC SA) to convert the text of the article into a DNA sequence. This algorithm associates each character with a three nucleotide sequence, just like our own genome defines amino acids with a three letter code. Unlike our own genetic code, Urbano’s tool is not degenerate (each letter has only one corresponding 3 letter code). With base four (Adenine, Thymine, Guanine, and Cytosine provide our bases) there is room for $4^3$ (64) unique codes. For example “I want to ride my bicycle” becomes

CTGAGCATGACTCTCTAGAGCTAGTGTAGCCACCTGTACCTAAGCACAGACAGCCATCTGTCAGACTCAATCCTA

The translation table and tool are available at http://earthsciweb.org/js/bio/dna-writer/.

To build the trees and alignments I used MAFFT. The sequences derived from each article can be relatively long, and MAFFT can handle longer sequences due to its use of the Fast Fourier Transform. MAFFT is available for download or use through a web interface here. I used the web interface, checking the Accurate and Minimum Linkage run options.

Once I had copied the tree in Nexus format, I ran FigTree by Andrew Rambaut to generate a useful graphical tree. I had included an unrelated article at Scientific American as an outgroup, and I chose the branch between that article and the group composed of press coverage of the UW macaque caloric restriction study as the root. This would correspond to a last common ancestor on a real phylogeny tree.

The resulting tree produces some interesting clades, for example ScienceDaily, esciencenews, and News-Medical, who essentially all just reproduced the UW-Madison press release, are grouped together. Another obvious group is the Tampa Bay Times and the Herald Tribune, which sourced the article from the New York Times and pared it down for their readers.

Here is the tree in Nexus format:

(((1_theScinder-:0.845,(((((((((((((((2_UWMPressRelease:0.0085,((4_escienceNews_UWM_:5.0E-4,5_ScienceDaily_UWPressRelease:5.0E-4):0.0,15_news-medical_UWM:5.0E-4):0.008):0.3115,26_aniNews:0.32):0.392,(14_natureWorldNews:0.7055,16_techTimes:0.7055):0.0065):0.006,25_expressUK:0.718):0.0025,20_hngn:0.7205):0.0195,(8_MedicalNewsToday:0.0,18_bayouBuzz_medicalNewsToday:0.0):0.74):0.0025,27_newsTonightAfrica:0.7425):0.047,(17_perezHilton:0.7805,(19_theVerge:0.6905,24_cbsLocalAtlanta:0.6905):0.09):0.009):0.0075,7_IFLS:0.797):0.007,21_seattlepi:0.804):0.006,12_nature:0.81):0.021,(6_yahooNews:0.0285,10_livescience:0.0285):0.8025):5.0E-4,((3_NYTimes:0.1875,11_HeraldTribune_NYT:0.1875):0.344,13_tampaBayTimes_NYT:0.5315):0.3):0.008,22_iol_dailyMail:0.8395):5.0E-4,9_healthDay/Philly_com:0.84):0.005):0.004,23_bbc:0.849):0.0245,28_OUTGROUPSciAmYeastyBeasties:0.8735);

. . .and this is a list of all the addresses for the articles I used and their labels on the tree: https://thescinder.com/pages/key-to-uwm-mac…logenetic-tree/