Abstract
When data analyses produce encouraging but nonsignificant results, researchers often respond by collecting more data. This may transform a disappointing dataset into a publishable study, but it does so at the cost of increasing the Type I error rate. How big of a problem is this, and what can we do about it? To answer the first question, we estimate the Type I error inflation based on the initial sample size, the number of participants used to augment the dataset, the critical value for determining significance (typically .05), and the maximum p value within the initial sample such that the dataset would be augmented. With one round of augmentation, Type I error inflation maximizes at .0975 with typical values from .0564 to .0883. To answer the second question, we review methods of adjusting the critical value to allow augmentation while maintaining p < .05, but we note that such methods must be applied a priori. For the common occurrence of post-hoc dataset augmentation, we develop a new statistic, paugmented, that represents the magnitude of the resulting Type I error inflation. We argue that the disclosure of post-hoc dataset augmentation via paugmented elevates such augmentation from a questionable research practice to an ethical research decision.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
