Abstract

Statistics, the much-maligned discipline feared by graduate students, is usually far removed from biomedical scientists’ day-to-day concerns. Possible exceptions are grant writing, addressing seemingly nitpicking journal reviewer comments, and seeking expert advice when experimental results are not as expected. Although biostatisticians see themselves as scientists in their own right, other scientists often see them as service providers or, worse, as impediments to getting the job done. Along these lines, US National Cancer Institute statistician Lisa McShane remarked how a biomedical scientist might expect a request for data analysis help to proceed: The statistician will probably tell me it will take six weeks to complete the analysis and might start asking questions about whether I carefully designed my study with appropriate controls and safeguards against confounding effects. Invariably the statistician will tell me that the sample size was insufficient and insist that I pre-specify the questions I am trying to address with my study. It seems like statisticians just throw roadblocks in the way.
1
The idea that statisticians are road-blocking police who enforce unrealistic research standards derives from a combination of at least three factors. First, most biomedical scientists have received little formal training in statistics. Second, as a direct result of this, biomedical scientists often will consult statisticians only after experiments have been completed. Third, many biostatisticians have received little formal training in biology lab processes. Taken together, these factors create a situation leading to frustration for all involved. The biologist sees unrealistic demands based on unfamiliar arguments. The applied statisticians’ response is well captured in R. A. Fisher’s famous quote: “To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.” 2 Biologists are thus perceived as uncooperative, uninterested, and uninformed. Statisticians are perceived as police officers or coroners. It’s no wonder the two rarely meet unless interaction is considered absolutely necessary. This is a loss, not only for the parties directly involved but also for the greater scientific community.
Eighty years later, Fisher’s pithy remark still resonates and highlights the underappreciated fact that forethought in study design and analysis are integral aspects of statistics, and thus integral to a well-executed experiment. Poststudy statistical fixes for design flaws are inevitably inferior (usually grossly so) to proper design. Similarly, trolling data sets when initial analyses don’t yield desirable results has the primary distinction of generating an excessive number of false positives, even when done by an expert statistician. As the four articles in this SLAS Discovery special collection illustrate, statistical work is most effective when it is proactive and cooperative. All articles display the combination of theoretical knowledge and practical acumen that Fisher considered essential for applied data analysis.
The article by James Hanley 3 provides snapshots of a distinguished biostatistical career in key biomedical areas dating back to the days of mainframe computers. He starts his article even further back, discussing the current relevance of Big Data sets from the 14th-century “Black Death” plague epidemic in Europe and the 19th-century cholera outbreak in London. He describes how contemporaneous analysis flaws of those data persist today, which, when combined with modern computers and algorithms, “allow us to be more precisely and spectacularly wrong.” Although some of these common errors in reasoning (such as the Texas sharpshooter error) are obvious to anyone with basic statistical training, Hanley argues that they have led to a situation in which “not all ‘discoveries’ are for the good of patients: some are merely for the good of academics.” Given decades of warnings from methodologists and more recent empirical evidence from meta-research, 4 the persistence of basic statistical errors in biomedical research is damning. It must be asked at what point this type of “error” morphs into willfully ignorant misconduct or fraud.
As a counterpoint, Hanley describes the challenges of performing data analysis in modern contexts and how even experts can be led astray by theoretically well-understood but insidious statistical phenomena, such as regression to the mean. He also gives a personal account of a serious error avoided by the unglamorous activity of manually checking an unusual data point in a slower paced and lower throughput timeframe. In the current hypercompetitive and time-sensitive world of biomedical research, careful checking is often considered an unaffordable luxury. This is to the field’s detriment, as shown repeatedly, notably by the “off-by-one” error (among others) in generating gene lists for Duke University cancer trials. 5 Hanley concludes with simulations showing the pitfalls of dipping periodically into data sets to analyze ongoing experiments and how statistical laws do hold when analyses are done as theory prescribes. Taken together, Hanley’s recollections and simulations demonstrate the value of wedding best experimental practices with appropriate statistical methods.
Mazoure et al. 6 contribute to the active area of algorithm development in high-throughput screening (HTS). Merging approaches from computer science, mathematics, and statistics, the authors use simulated and ChemBank empirical data to illustrate that systematic error (bias) can be either additive or multiplicative in complex ways with implications for measurement and analysis. They also provide a useful reminder of the inextricable connection between design and statistics by cautioning that bias correction methods are on firmer ground when “tested samples are randomly distributed within given HTS/HCS/SMM plates.” Randomization can be readily accomplished for custom screens with modern robotics and, as history has shown, is usually worth the effort. Randomization also improves manufactured product quality, as evidenced by Affymetrix’s move away from contiguous placement of same-gene probes on their early microarray chips to distributed placement throughout the chips. Although the HTS community has not embraced statistical reasoning to the same extent as gene expression microarray enthusiasts have, there are hopeful signs that this is changing.
The report by Moore et al. 7 describes original research in extracting maximum information from data-rich cell-based assays. Based on evidence of their importance in biological systems and their potential for medical applications, the authors focus on cell heterogeneity indices. They correctly point out that estimating statistical quantities that characterize the distribution of cell measurements prevents “loss of key information” relative to the more usual approach of using population-averaged readouts. Although this has been a long-recognized truism, it is only more recently that this approach has been put on more solid statistical footing. Indeed, the authors show that quantitative estimates of cell heterogeneity outperform graphics-based qualitative assessments of statistical distributions of cell measurements. They also present radar plots that, although likely unfamiliar to most scientists, show great promise to quickly and nontechnically convey detailed quantitative information on cell heterogeneity indices. The report will be of interest to scientists working with cell-based assays and to computational scientists who wish to become more familiar with the challenges of this exciting and burgeoning field.
Gubler et al. 8 describe the history and continuing development of Novartis’ Helios data management and analysis software system. The authors write of being guided throughout by best practices in the full spirit of reproducible research, with the agility to adjust as standards inevitably change over time. They describe the challenges of developing a single system that meets the demands of multiple technologies and data formats/throughput, multiple levels of user sophistication and needs, scaling up and down, the dizzying pace of new technologies and capacity, and users on three continents. They attribute much of their success to flexible generic coding from the beginning with the addition of motivated input from various stakeholders (an often-overlooked factor). The authors address hiccups in the data analysis pipeline that occur at this level of complexity and describe various solutions to the resulting threats to data validity, including modern robust statistical procedures, on-demand visualization, and facilitation of human intervention. The article nicely illustrates the advantages of a happy marriage between time-honored statistical principles and the automation needed for modern pharmaceutical data systems. It is a must-read for anyone interested in comprehensive, statistically principled, and robust software development.
We close with a less well-known Fisher quote: “Immensely laborious calculations on inferior data may increase the yield from 95 to 100 per cent. A gain of 5 per cent. of perhaps a small total. A competent overhauling of the process of collection, or of the experimental design, may often increase the yield ten or twelve fold, for the same cost in time and labour” 2 (italics added). To realize these gains, biomedical and statistical scientists need to develop working relationships, with sufficient knowledge of each other’s disciplines to, at the very least, “know what they don’t know.” The earliest days of microarray experiments, when statistical and biomedical scientists began to interact more extensively than they had previously, are instructive about how things can go wrong. For their part, many biomedical scientists erroneously claimed that the biological replication required by statisticians was unnecessary, and in any case was too expensive. Replicates began to be used when it was realized that forgoing them was a false savings that led to incorrect conclusions. At this point, however, some statisticians insisted on inappropriately strict false-positive (Type I error) correction for multiple testing that had been developed for low-throughput confirmatory experiments rather than for more exploratory high-throughput studies, a misconception that was corrected with the development of false discovery rate (FDR) methods. Although these mutual missteps were eventually resolved, there remain many examples of continuing misunderstandings.
So, what are the odds of increasing the number of productive relationships between biomedical and statistical scientists? A back-of-the-envelope calculation suggests that the occasional, non-project-related lunch could go a long way toward bringing the groups together.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
