Abstract
In a recent article, André (2022) addressed the decision to exclude outliers using a threshold across conditions or within conditions and offered a clear recommendation to avoid within-conditions exclusions because of the possibility for large false-positive inflation. In this commentary, I note that André’s simulations did not include the situation for which within-conditions exclusion has previously been recommended—when across-conditions exclusion would exacerbate selection bias. Examining test performance in this situation confirms the recommendation for within-conditions exclusion in such a circumstance. Critically, the suitability of exclusion criteria must be considered in relationship to assumptions about data-generating mechanisms.
It is common practice in research to identify and remove unrepresentative responses. Excluding these responses is challenging because there are many possible procedural decisions one can use to identify them. Frequently, researchers have sensible rules to exclude unrepresentative responses, such as failing attention checks. Other times, they use the extremity of responses as an indicator of unrepresentativeness and exclude these observations, which are called “outliers.” Because the point of statistical inference is first to generalize from samples to parent populations, the imperative of representativeness is the foundation on which outlier exclusion is based (Aguinis et al., 2013). In a recent article, André (2022) helpfully addressed the decision to exclude outliers using a threshold across conditions or within conditions. André offered a clear recommendation to avoid within-conditions exclusions because of the possibility for large false-positive inflation.
Critically, the suitability of exclusion criteria must be considered in relationship to assumptions about data-generating mechanisms (i.e., the parent population and the manner or manners in which unrepresentative responses are generated). The correct approach to removing contamination by unrepresentative responses depends, of course, on how the sample is contaminated in the first place. Imagine there are two experimental conditions, control (A) and treatment (B), whose population means, μA versus μB, researchers are interested in comparing. If the null hypothesis is that their means do not differ, H0: μA – μB = 0. One would then draw samples to test this. Consider a sample
In practice, however, researchers do not know that their samples are in fact drawn from only the parent distributions of interest. Rather, they may suspect that their samples are “contaminated” by observations that are unrepresentative of the populations of interest. That is, they have a sample
Meyvis and Van Osselaer (2018) recommended searching for responses that deviate exceptionally “from the cell mean” (p. 1162), and they explained their reasoning for within-conditions exclusions. Using a single threshold can “create a nonequivalence of participants between conditions (i.e., introducing a confound)” (p. 1163), which is otherwise known as introducing selection bias (Heckman, 1979). Under the null considered by André (2022), without contamination or with equal contamination, this risk of creating a nonequivalence does not exist, hence André’s rejection of within-conditions exclusion. However, when there is differential contamination extremity, this potential problem should lead one to consider within-conditions exclusion.
Imagine two contaminated conditions, but the contamination in one condition is more extreme than in the other. For example, suppose one condition has contaminants drawn from
Assuming 5% contamination, within-conditions exclusion using a 3-

The impact of across versus within exclusions with differential contamination extremity using a 3-
When it exacerbates selection bias, across-conditions exclusion increases the likelihood of misestimating the sign or magnitude of an effect. Within-conditions exclusion, on the other hand, generates a more dispersed distribution of
In fact, the increased bias of across-conditions exclusion also occurs when the representative observations are drawn under the alternative hypothesis,
Although André (2022) is correct to recommend a healthy skepticism of within-conditions exclusion, within-conditions exclusion is recommendable when across-conditions exclusion is expected to exacerbate selection bias (e.g., with differential contamination extremity). The recommendation in this situation can be succinctly represented using weighted error rates computed from the simulation results (Maier & Lakens, 2022; Mudge et al., 2012). Figure 2 plots the weighted error rate under within- and across-conditions exclusion as a function of the relative cost of making a Type I versus Type II error. Conventionally, this ratio is implicitly set at 4 (5% Type I error rate, 20% Type II error rate, with a desired weighted error rate of 8%). In this situation, within-conditions exclusion is just barely preferred to across-conditions exclusion. This is exactly the situation in which excluding outliers within conditions has been previously recommended (Cousineau & Chartier, 2010; Meyvis & Van Osselaer, 2018) and within-conditions exclusion dominates across-conditions exclusion as Type II errors become more costly relative to Type I errors (i.e., as the ratio decreases from 4). Note, however, that beyond these error rates, selection bias biases the test statistics (Figs. 1a and 1b), which may be the more problematic issue. When outlier exclusion comes with the threat of differential selection, the benefits of within-conditions exclusion and risks of across-conditions exclusion are at their peak.

Weighted error rates of across versus within exclusion with differential contamination extremity as a function of the relative cost of Type I and Type II errors. Using a 3-
To sum up, the decision to use within-conditions exclusion has often not been thoroughly scrutinized, and in many situations, within-conditions exclusion should not be used because of its tendency to inflate Type I error rates (André, 2022). Yet the choice of outlier-exclusion approach is conditional on the expected state of the world and research goals, and it is up to researchers to ensure all the necessary conditions are in place to support their approach. In line with André (2022), in this commentary, I assumed there is suspected contamination, outlying is indicative of contamination, and outliers should be removed (if any of these conditions is not met, outlier exclusion should be avoided altogether). When the researcher intends to remove outliers and it is only a question of deciding on an appropriate exclusion rule (across vs. within conditions), this commentary highlights the importance of also considering the nature of the expected contamination (and more broadly, nature of the data). Researchers must be conscious of the assumptions scaffolding the suitability of their exclusion approaches.
Supplemental Material
sj-docx-1-amp-10.1177_25152459231186577 – Supplemental material for The Appropriateness of Outlier Exclusion Approaches Depends on the Expected Contamination: Commentary on André (2022)
Supplemental material, sj-docx-1-amp-10.1177_25152459231186577 for The Appropriateness of Outlier Exclusion Approaches Depends on the Expected Contamination: Commentary on André (2022) by Daniel Villanova in Advances in Methods and Practices in Psychological Science
Footnotes
Acknowledgements
The author gratefully acknowledges support from the Open Access Publishing Fund administered through the University of Arkansas Libraries.
Transparency
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
