About statistical significance,and the lack thereof

Abstract

German

Spanish

French

Absence of statistical significance (i.e., p > 0.05) in the results of a frequentist test comparing two samples is often used as evidence of absence of difference, or absence of effect of a treatment, on the measured variable. Such conclusions are often wrong because absence of significance may merely result from a sample size that is too small to reveal an effect. To conclude that there is no meaningful effect of a treatment/condition, it is necessary to use an appropriate statistical approach. For frequentist statistics, a simple tool for this goal is the ‘two one-sided t-test,’ a form of equivalence test that relies on the a priori definition of a minimal difference considered to be relevant. In other words, the smallest effect size of interest should be established in advance. We present the principles of this test and give examples where it allows correct interpretation of the results of a classical t-test assuming absence of difference. Equivalence tests are also very useful in probing whether certain significant results are also biologically meaningful, because when comparing large samples it is possible to find significant results in both an equivalence test and in a two-sample t-test, assuming no difference as the null hypothesis.

Keywords

Equivalence test effect size animal behavior statistical significance

Get full access to this article

View all access options for this article.

References

Farrar

Vernouillet

Garcia-Pelegrin

, et al. Reporting and interpreting non-significant results in animal cognition research. PeerJ 2023; 11: e14963.

Altman

Bland

JM.

Absence of evidence is not evidence of absence. BMJ 1995; 311: 485.

Quertemont

How to statistically show the absence of an effect. Psychol Belg 2011; 51: 109–127.

Schuirmann

DJ.

A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokinet Biopharm 1987; 15: 657–680.

Lakens

Scheel

Isager

PM.

Equivalence testing for psychological research: a tutorial. Adv Methods Pract Psychol Sci 2018; 1: 259–269.

Pellow

Chopin

File

, et al. Validation of open:closed arm entries in an elevated plus-maze as a measure of anxiety in the rat. J Neurosci Methods 1985; 14: 149–167.

Walf

Frye

CA.

The use of the elevated plus maze as an assay of anxiety-related behavior in rodents. Nat Protoc 2007; 2: 322–328.

Hogg

A review of the validity and variability of the elevated plus-maze as an animal model of anxiety. Pharmacol Biochem Behav 1996; 54: 21–30.

Seaman

Serlin

RC.

Equivalence confidence intervals for two-group comparisons of means. Psychol Methods 1998; 3: 403–411.

10.

Lakens

Equivalence tests: a practical primer for tests, correlations, and meta-analyses. Soc Psychol Pers Sci 2017; 8: 355–362.

11.

Caldwell

AR.

Exploring Equivalence Testing with the Updated TOSTER R Package. PsyArXiv 2022. doi:10.31234/osf.io/ty8de.

12.

Rose

Mathew

Coss

, et al. A new statistical method to test equivalence: an application in male and female eastern bluebird song. Anim Behav 2018; 145: 77–85.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB