Sage Journals: Discover world-class research

Abstract

Keywords

Funder and Ozer (2019) argued that small effects can have important implications in cumulative long-run scenarios. We certainly agree. However, some important caveats merit explicit consideration. We elaborate on the previously acknowledged importance of preregistration (and open-data practices) and identify two additional considerations for interpreting small effects in long-run scenarios: restricted extrapolation and construct validity.

Preregistration and Open Data

Interpreting small effects requires discrimination at two levels: first, discriminating reliable but small effects from null effects when flexibility in analysis has inflated effect size, and second, distinguishing between reliable but small effects that are and are not likely to cumulate to meaningfully predict or affect behavior. Analysis involves a series of decisions (e.g., covariate inclusion/exclusion, subscale analysis, outlier treatment) known collectively as the garden of forking paths (Gelman & Loken, 2014). Researcher degrees of freedom can increase Type I error rates and inflate effect sizes (Simmons, Nelson, & Simonsohn, 2011). Preregistering analysis plans constrains these forking paths. Flexibility in analysis can inflate effects of all sizes, but the consequences of such inflation—and associated risks of overinterpreting effects and overestimating the likelihood that they will cumulate to produce practically important outcomes—are perhaps greater if inflation nudges an effect size from below to above the cutoff for the “smallest effect size of interest” (rather than, e.g., from moderate to large). Studies finding small effects but lacking or deviating from preregistered analysis plans should be interpreted cautiously. Further, although not a panacea for research bias, open-data practices help establish the robustness of small effects in two ways. First, open-data practices allow interested researchers to establish if effects are contingent on highly contrived analysis strategies; such effects should be interpreted and generalized with caution. Second, open-data practices allow interested researchers to estimate the smallest effect of interest by, for example, comparing the small effects of interest with nonsense relationships in the data (Orben & Przybylski, 2019).

Restricted Extrapolation

To illustrate the importance of small effects, Funder and Ozer (2019) presented examples in which small effects accumulate to produce meaningful outcomes. First, they considered the small correlation between a baseball player’s performance for any individual at bat and batting average (Abelson, 1985). This context demonstrates the importance of small effects: Single trials cumulatively contribute to players’ batting averages and, subsequently, teams’ winning percentages. It also highlights some important caveats. First, the relationship between single at bats and batting average is the relationship between A and Aˉ across a series of trials. Such measurement purity is uncommon in psychology, and increasing measurement noise decreases the predictive value of small effects. Second, one must be cautious extrapolating small observed relationships to third variables. Although at-bat performance cumulatively relates to batting averages (because it indexes cumulative performance over single at bats), this small relationship is, relatively speaking, not a particularly useful predictor of team performance. Players are valuable because they get on base: On-base percentage contributes twice as much as batting average to winning percentage (Hakes & Sauer, 2006). Small effects may cumulate to produce reliable relationships, but the predictive value of these relationships for third-variable outcomes in practical settings may remain low and, therefore, relatively unimportant.

Similarly, a small relationship between ego depletion and having a “short fuse” in stressful conversations may predict the likelihood of disagreements during stressful conversations without necessarily translating to third-variable outcomes (e.g., marital friction; Funder & Ozer, 2019).

Even in causal relationships, small effects of A on B do not guarantee meaningful third-variable consequences. Relationships between second and third variables in causal chains are usually imperfect, which plausibly constrains indirect effects to potentially inconsequential levels. One must be cautious not to overstate the importance of small effects by extrapolating to unmeasured consequences. Small effects of A on B may cumulatively produce meaningful changes in B without implying meaningful effects on C. Further, as Funder and Ozer noted, current theorizing typically precludes robust predictions about whether small effects will cumulate in strength or consequences.

Construct Validity

The example of batting averages and at-bat performance is almost unique. Batting performance is not operationalized; it is a direct measure of the target construct. Thus, the example translates poorly to most psychological research, in which interests often relate to latent constructs (e.g., personality, memory). Even examining observable behaviors (e.g., number of cigarettes smoked) often requires indirect measurement (e.g., self-report). Relationships between constructs and operationalized variables are imperfect, and when constructs are unobservable, the magnitude and direction of measurement error is unknowable. Thus, Funder and Ozer’s precondition that estimation is reliable may often be unverifiable. A small effect on an operationalized variable (r = .05) may be stronger or weaker than the effect on the construct of interest (Cohen, 1988, 1992). For example, manipulating a salient variable in tightly controlled experimental conditions may exaggerate effects relative to their magnitude in complex applied settings (Schäfer & Schwarz, 2019), demand artifacts may inflate effects (Sawyer, 1975), or effects on self-reported outcomes may overstate effects on actual outcomes. When true effects are weaker than small observed effects, they may be statistically equivalent to zero. Thus, small effects should be interpreted with greater caution as measurements become less direct and constructs more abstracted.

Moreover, the relationship between at-bat performance and batting averages contains few plausible confounds. This is untrue for most psychological constructs, and in complex systems, small effects may easily be produced or mitigated by uncontrolled confounds. For instance, regarding ego depletion, people may hate their jobs because they are anxious or aggressive people, and these traits (not self-control depletion) may explain their short fuse. Alternatively, good communication may practically eliminate the small effect of depleted ego on disagreements.

Concluding Remarks

Small effects may have important cumulative impacts over long runs. However, the boundary conditions for such cumulative effects are poorly understood. Preregistration and open data will likely improve verifiability of small effects, but the ability to reliably determine the practical importance (i.e., cumulative consequences) of small effects remains underdeveloped. When plausible uncontrolled confounds exist, indirect measurement is employed, or constructs of interest are latent, researchers should be cautious in interpreting the practical importance of small effects. Similarly, researchers should limit interpretations to available data, not extrapolate small effects to unobserved third variables.

Footnotes

Transparency

Action Editor: Alexa Tullett

Editor: Daniel J. Simons

Author Contributions

The two authors contributed equally to all aspects of this work.

ORCID iDs

James D. Sauer

Aaron Drummond

References

Abelson

R. P.

(1985). A variance explanation paradox: When a little is a lot. Psychological Bulletin, 97, 129–133.

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.

Cohen

(1992). A power primer. Psychological Bulletin, 112, 155–159.

Funder

D. C.

Ozer

D. J.

(2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 22, 156–168. doi:10.1177/2515245919847202

Gelman

Loken

(2014). The statistical crisis in science. American Scientist, 102, 460.

Hakes

J. K.

Sauer

R. D.

(2006). An economic evaluation of the Moneyball hypothesis. Journal of Economic Perspectives, 20(3), 173–186.

Orben

Przybylski

(2019). The association between adolescent well-being and digital technology use. Nature Human Behavior, 3, 173–182.

Sawyer

A. G.

(1975). Demand artifacts in laboratory experiments in consumer research. Journal of Consumer Research, 1, 20–30.

Schäfer

Schwarz

M. A.

(2019). The meaningfulness of effect sizes in psychological research: Differences between sub-disciplines and the impact of potential biases. Frontiers in Psychology, 10, Article 813. doi:10.3389/fpsyg.2019.00813

10.

Simmons

J. P.

Nelson

L. D.

Simonsohn

(2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366.