Sage Journals: Discover world-class research

Abstract

Null hypothesis significance testing (NHST) provides an important statistical toolbox, but there are a number of ways in which it is often abused and misinterpreted, with bad consequences for the reliability and progress of science. Parts of contemporary NHST debate, especially in the psychological sciences, is reviewed, and a suggestion is made that a new distinction between strongly, weakly, and very weakly anti-NHST positions is likely to bring added clarity to the debate.

Keywords

statistics hypothesis testing

Get full access to this article

View all access options for this article.

References

Barndorff-Nielsen

O. E.

(1985). Diversity of evidence and Birnbaum’s theorem. Scandinavian Journal of Statistics, 22, 513-515.

Bretz

Hothorn

Westfall

(2011). Multiple comparisons using R. Boca Raton, FL: CRC Press.

Bygren

L. O.

Tinghög

Carstensen

Edvinsson

Kaati

Pembrey

Sjöström

(2014). Change in paternal grandmothers’ early food supply influenced cardiovascular mortality of the female grandchildren. BMC Genetics, 15, 12.

Carver

(1978). The case against statistical significance testing. Harvard Educational Review, 48, 378-399.

Cohen

(1994). The earth is round (p < .05). American Psychologist, 49, 997-1003.

Cox

D. R.

Hinkley

D. V.

(1974). Theoretical statistics. London, England: Chapman & Hall.

Cumming

(2009). Dance of the p values. Retrieved from https://www.youtube.com/watch?v=ez4DgdurRPg

Cumming

(2012a). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge.

Cumming

(2012b. April 18). Mind your confidence interval: How statistics skew research results. The Conversation. Retrieved from http://theconversation.com/mind-your-confidence-interval-how-statistics-skew-research-results-3186

10.

Cumming

(2014). There’s life beyond 0.05. Observer, 27(3). Retrieved from http://www.psychologicalscience.org/index.php/publications/observer/2014/march-14/theres-life-beyond-05.html

11.

Decullier

Lhértier

Chapuis

(2005). Fate of biomedical research protocols and publication bias in France: retrospective cohort study. British Medical Journal, 331(7507), 19.

12.

Efron

(2012). Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction. Cambridge, England: Cambridge University Press.

13.

Fanelli

(2010). “Positive” results increase down the hierarchy of the sciences. PLoS One, 5(4), e10068.

14.

Fanelli

(2012). Negative results are disappearing from most disciplines and countries, Scientometrics, 90, 891-904.

15.

Feynman

(1985). Surely you’re joking, Mr. Feynman: Adventures of a curious character. New York, NY: W. W. Norton.

16.

Fisher

R. A.

(1956). Statistical methods and scientific inference. Edinburgh, Scotland: Oliver & Boyd.

17.

Gallistel

(2015). Bayes for beginners: Probability and likelihood. Observer, 28(7). Retrieved from http://www.psychologicalscience.org/index.php/publications/observer/2015/september-15/bayes-for-beginners-probability-and-likelihood.html

18.

Häggström

(2010). Book review: The cult of statistical significance. Notices of the American Mathematical Society, 57, 1129-1130.

19.

Häggström

(2013). Why the empirical sciences need statistics so desperately. In Latala

. (Eds.), European Congress of Mathematics, Krakow, 2-7 July, 2012 (pp. 347-360). Zurich, Switzerland: EMS Publishing.

20.

Häggström

(2014a). Om statistisk signifikans, epigenetik och de norrbottniska farmödrarna [On statistical significance, epigenetics and the grandmothers from Norrbotten]. (Häggström hävdar blog). Retrieved from http://haggstrom.blogspot.se/2014/02/om-statistisk-signifikans-epigenetik.html

21.

Häggström

(2014b). On the value of replications: Jason Mitchell is wrong (Häggström hävdar blog). Retrieved from http://haggstrom.blogspot.se/2014/07/on-value-of-replications-jason-mitchell.html

22.

Ioannidis

(2005). Contradicted and initially stronger effects in highly cited clinical research. Journal of the American Medical Association, 294, 218-228.

23.

Kruschke

(2010). Bayesian data analysis. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 658-676.

24.

Lehmann

Romano

(2008). Testing statistical hypotheses (3rd ed.). New York, NY: Springer.

25.

Lehrer

(2010, December 13). The truth wears off. The New Yorker. Retrieved from http://www.newyorker.com/magazine/2010/12/13/the-truth-wears-off

26.

Mitchell

(2014). On the emptiness of failed replications. Retrieved from https://web.archive.org/web/20140708164605/http://wjh.harvard.edu/~jmitchel/writing/failed_science.htm

27.

Morey

Hoekstra

Rouder

Lee

Wagenmakers

E.-J.

(2016). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review, 23, 103. doi:10.3758/s13423-015-0947-8

28.

Nieuwenhuis

Forstmann

B. U.

Wagenmakers

E.-J.

(2011). Erroneous analyses of interactions in neuroscience: A problem of significance. Nature Neuroscience, 14, 1105-1107.

29.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, 6251.

30.

Pawitan

(2001). In all likelihood: Statistical modelling and inference using likelihood. Oxford, England: Oxford University Press.

31.

Salsburg

(2001). The lady tasting tea: How statistics revolutionized science in the twentieth century. New York, NY: W. H. Freeman.

32.

Savage

L. J.

(1962). The foundations of statistical inference. London, England: Methuen.

33.

Trafimow

Marks

(2015). Editorial. Basic and Applied Social Psychology, 37, 1-2.

34.

Ziliak

McCloskey

(2008). The cult of statistical significance: How the standard error costs us jobs, justice and lives. Ann Arbor: University of Michigan Press.

35.

Ziliak

McCloskey

(2010). We agree that statistical significance proves essentially nothing: a rejoinder to Thomas Mayer. Econ Journal Watch, 10(1), 97-107.

The Need for Nuance in the Null Hypothesis Significance Testing Debate

Abstract

Keywords

Get full access to this article

References