Attention to Authenticity: An Essential Analogue to Focus on Rigor and Replicability

Abstract

January 1, 2020, was a momentous day for me—it was the start of my 4-year term as Editor-in-Chief of the Association for Psychological Science (APS) flagship journal, Psychological Science. As former Editor-in-Chief Robert Kail (2007-2011) said of serving in the role, it was the start of “drinking from a firehose.” Well beyond the workload, the role has been an honor, a challenge, a source of deep humility, and a perspective on the field of psychological science from which few have the privilege to observe.

I assumed the role of Editor-in-Chief from D. Stephen (Steve) Lindsay, who served in the role for five years (2015-2019), including one year as Acting Editor-in-Chief when Eric Eich (2012-2015) left the position early. As Steve reflected in his “swan song” editorial (Lindsay, 2019), a major focus of his term was responding to the replication crisis that gripped our field starting in earnest with the publication of the Open Science Collaboration (2015) report of the large-scale failures to replicate the findings in articles from a number of psychology journals, including Psychological Science. Steve’s final reflection was on the progress made by the journal in addressing the replication crisis. Three of the most visible actions initiated by Eric Eich and fully realized by Steve Lindsay were calls for more comprehensive statistical reporting; eliminating word count restrictions in Method and Results sections (though maintaining them for introductory and discussion material) so that authors could fully report their approaches and their findings; and introducing badges in recognition of open science practices including publicly sharing data and materials and preregistering studies.

When I became Editor-in-Chief, I inherited the fruits of the labors of my predecessors. The trends on which Steve commented (Lindsay, 2019) continued in my term. Submissions to the journal now routinely include information such as power analyses, effect sizes, confidence intervals, and so forth. Authors of accepted articles also are required to produce a “clean” statcheck report (http://statcheck.io/; akin to a spell and grammar check for statistical reporting) prior to production of their articles for publication. Although some may miss the short, pithy reports for which Psychological Science historically was known, there has been an increase in average article length from just over 9 pages in 2014 to 14.4 pages in 2022 (A. Drew, personal communication), reflecting more complete descriptions of methods and results. We also have seen a substantial rise in the number of articles eligible for open science badges. As of 2022 (the most recent year for which full data are available), 69% of articles published in the journal were awarded an Open Data badge, 55% an Open Materials badge, and 43% a Preregistered badge. Overall, fully 80% of articles published in the journal were awarded at least one of the available badges and 32% were awarded all three. These trends reflect positive momentum toward achieving the highest possible standards of publication in the journal, at least as indexed by adoption of open science practices.

I am a strong proponent of efforts to increase transparency and openness in the field of psychological science broadly and in Psychological Science specifically. The efforts permit greater confidence that what is reported in articles truly represents what was done and what was found, and they stand to improve the replicability of the research we conduct and publish. Yet just as we cannot rest on the laurels of progress in open science practices, we should not overlook a looming “crisis” that stands to render all of those efforts extraneous. As I opined in Bauer (2023), the crisis is one of generalizability and authenticity. One of the major goals of psychological research is to build knowledge that applies beyond the specific circumstances under which it was discovered. In my own area of memory research, for example, we aim to generate broad principles of how memory works, not just how memory performance unfolded in one sample of participants drawn from a specific population, who were tasked with remembering a specific set of words or pictures. The goal of generating generalizable knowledge is especially important when the research is intended to apply beyond the laboratory—to clinics, workplaces, schools, and courts, to name a few.

Unfortunately, many of the practices that are common in psychological science research limit the generalizability of the data we collect. In some cases, they may actually prevent us from addressing the important questions that motivated the work to begin with. The practices lessen the authenticity of the work we do and, if left unchecked, threaten to render our discipline irrelevant and obsolete. This threat is every bit as existential as that posed by practices that limit the rigor and replicability of our research. We can have the highest levels of rigor and full direct replicability, but unless the rigorous and replicable work we do informs the human condition, it has meaning only to those of us who conduct it.

What do I mean by “authenticity”? Something is authentic when it is genuine or when it is a “faithful imitation of an original” (Merriam-Webster). Applied to research, research is authentic when it is conducted on a sample to which the result is intended to or claims to apply, when it defines constructs in an unbiased manner and in such a way that the definition applies to the entire sample and thus the target population, when the stimuli capture the richness of the phenomenon under study, and when the description of the findings adequately captures not only “average” behavior but individual variability as well.

Many of the practices typically followed in psychological research undermine authenticity in one way or another:

(a) We test participants from MTurk or Prolific, or rely on some other crowdsourced sample, and draw conclusions about “people.” We draw conclusions about “people” after testing samples that are drawn from the United States (most typically) and overwhelmingly, from Western, Educated, Industrialized, Rich, and Democratic (WEIRD; Henrich et al., 2010) countries. In many cases, the samples are of students in undergraduate psychology classes, even when the focus of study is a clinical syndrome, such as depression, for example, or a behavior disorder, such as conduct disorder. Such samples also are predominantly White and female. These recruitment techniques virtually ensure nonrepresentative samples. To claim that they produce findings that generalize to “people” is a risk to authenticity.

(b) We rely on country-level data and assume the measured characteristics apply to all of the individuals in the geographic area sampled, such as attributing to individuals in Western cultures agentic or independent attitudes, and individuals in Eastern cultures communal or interdependent attitudes, for example. We extend to those who endorse a political party or claim a religious affiliation the entire set of beliefs and attitudes of that group. These 10,000-foot observations capture psychologically significant variability, yet overreliance on them ignores the reality that every individual has characteristics of both their own and the contrasting group, to some degree (Markus & Kitayama, 2010). And even when we actually measure the attitudes and behaviors of those in the groups of interest (rather than assume them), we often administer surveys and questionnaires without first establishing measurement equivalence. These practices virtually ensure over-generalization of research findings. When they go unnoted, they are a risk to authenticity.

(c) We measure complex psychological constructs with only one or a small number of items on a survey and assume we have captured their essence. We use “imagine that . . .” vignettes—often only one per issue or category—without grounding them in behavior. We use highly simplified, even impoverished, stimuli and conclude that the behavior in response to them is reflective of how multi-faceted cognitive or social processes unfold in everyday experience. These practices are attractive for their efficiency and internal validity. Yet as noted by West and colleagues (2022), “. . . by prioritizing control over context, researchers may unwittingly sacrifice critical aspects of the original phenomena and risk reifying abstractions that do not generalize beyond a simplified setting” (p. 67). When the limitations these practices impose go unrecognized, they are a risk to authenticity.

(d) We draw conclusions based on mean levels of performance, often ignoring the individual variability around them. Based on means alone, we may conclude that two groups or conditions differ, yet overlook the substantial overlap in distributions, even in cases when a subset of participants in one sample behaved very similarly to those in the other. In such cases, the conclusion that there are mean differences between the two experimental groups (or conditions) may be accurate from a statistical standpoint, but it does not provide useful information that supports generalization. And even when we recognize the variability about the mean, we often take no steps to understand it and indeed may not even comment on it in discussion of the findings. To ignore the range of observations in favor of concentration on the mean—and “pass” on efforts to understand the sources of variability—is a risk to authenticity.

The purpose of noting these practices is not to be discouraging of research efforts and certainly not to deflect attention from important efforts to increase the rigor and replicability of our research. Rather, just as former Editor-in-Chief Eric Eich called for “business not as usual” in his editorial response to the replication crisis (Eich, 2014), I call for “business not as usual” in research practices such as those just noted, in order to diminish threats to authenticity. Efforts to increase the rigor and replicability of our research are critical to the continued health and wellbeing of the discipline of psychological science. If we do not have rigor and replicability, we have nothing more than sand sifting through our fingers. Yet single-minded focus on that side of the coin risks neglect of its equally important flip side—the side of authenticity. If we have rigorous and replicable tests, but those tests lack authenticity, then we have the same problem of sifting sand.

We can celebrate evidence that the work published in Psychological Science seemingly has not fallen prey to irrelevance. As unobtrusive indices of our relevance, I note that the Journal Impact Factor in 2021 (the most recent year for which the data are available) was a healthy 7.29, rising above the previous peak of 6.13 set in 2017. Clearly, people are seeing the research published in the journal and are using it to inform their own work. As another index, full-text downloads of the journal have risen from 1.3 million in 2018 to 2.7 million in 2022. This indicates that millions of eyes are falling on the content we publish.

I would like to think that these positive trends in the impact of Psychological Science were aided by two initiatives to increase authenticity that I took during my term. First, I required authors to provide a Statement of Relevance of their work. These 150-word statements are intended to convey to a broad audience the question(s) addressed in the research, why they are interesting, the “bottom line” findings of the work, and reflection on why they are important. Evidence that they have an impact on authors themselves comes from a message from an author to former Peer Review Coordinator Ami Getu: “Funny thing about that statement of relevance. I was deep into the submission process a couple weeks ago when I discovered the statement of relevance policy. I had trouble coming up with the end. Figuring out what to say led me to overhaul my General Discussion. It opened it up and definitely improved it” (anonymous source; 13 August 2020). The second initiative was to highlight some articles in each issue of the journal under the heading “Psychology in the Public Eye.” The heading signals that the work under it is especially likely to inform the human condition, and thus to be of particularly broad interest and significance. Many months, I had difficulty limiting my selection to only two or three articles to feature in this section. These are signs of a healthy journal, one that takes authenticity seriously.

So if Psychological Science already takes authenticity seriously, why this missive on inauthentic research practices? There are two primary reasons. The first is that the research published in the journal is not necessarily representative of the population of psychological research conducted and nor is it even representative of the research submitted to the journal. From the perspective of roughly 6,000 submitted articles over the past 4 years, I can say with confidence that practices that undermine authenticity are alive and well in our field. What that means is that researchers are devoting enormous effort and expense to studies that fall short of the ideal in terms of addressing interesting and important research questions—and important societal issues—through research that is genuine or a faithful imitation of an original. By highlighting the threats to authenticity, my intention is to encourage all of us to reflect on our approaches to our science, ferret out inauthentic practices, and take actions to increase authentic ones.

The second reason to draw attention to threats to authenticity in psychological research is that we cannot rest on our laurels but must constantly—and consciously—work to do better. I recognize that I am calling for this action in an environment with significant headwinds: distrust of scientists and scientific findings, lessening public and governmental support for higher education (the site of much of our research), ongoing assaults on tenure, threats to academic freedom, a shrinking funding stream, and changes in the academic publishing model the full effects of which are yet to be felt, to name only a few. One potential response to these pressures is the “MTurkification” of our field (Anderson et al., 2019): focus on questions that are easy to ask and easy to “answer” using online data collection, crowdsourcing, existing datasets, and other labor- and money-saving devices. Conducting authentic research requires that we push back against these forces, by asking challenging questions, in appropriate samples, with psychologically probative measures, and that we resist the temptation to draw overbroad conclusions that fail to capture the true nature of the behaviors we seek to understand. In other words, we must resist the siren’s call for quick data collection, with instruments that barely scratch the surface of a complex psychological construct, and that offer sweeping conclusions seemingly without limits on their generalizability. If we are to keep the journal and the field healthy and vibrant, we must strive to conduct work that is not only rigorous and replicable, but also authentic.

In my efforts to attract and publish authentic tests of interesting hypotheses, I have been aided by many individuals. I extend my sincere gratitude to the Senior and Associate Editors who have worked to execute and secure high-quality reviews of submissions and make informed decisions about them. Working with this group of dedicated and insightful individuals is an aspect of this job that I will sorely miss. I thank the Statistical and Open Science Advisors who have aided in our efforts to ensure analytic integrity and adherence to open science practices associated with award of badges. I thank members of the Editorial Board for providing reviews—sometimes in very short timeframes—and serving as ambassadors for the journal. And of course, I thank the occasional Guest Editors and the many ad hoc reviewers for their exceptionally important contributions to the journal and the field. All of this effort has been expertly curated by Elaine Walker (Chair, APS Publications Committee), and members of the APS staff, including former members Ami Getu (Peer Review Coordinator) and Brian Winters (Managing Editor), and current members Becca White (Peer Review Manager), Aime Ballard-Wood (Chief Operating Officer/Deputy Director), and Robert Gropp (Chief Executive Officer/Executive Director). From among this group, I extend special thanks to Amy Drew (Director of Publications) for providing sage advice (no pun intended) just when I needed it the most, and generally helping to keep the journal (and me) on track and moving in the right direction. Finally, here’s wishing Incoming Editor-in-Chief Simine Vazire enormous success in her term, and an “E-ticket” ride in the process!

Patricia J. Bauer Outgoing Editor-in-Chief

References

Anderson

C. A.

Allen

J. J.

Plante

Quigley-McBride

Lovett

Rokkum

J. N.

(2019). The MTurkification of social and personality psychology. Personality and Social Psychology Bulletin, 45, 842–850. https://doi.org/10.1177/0146167218798821

Bauer

P. J.

(2023). Generalizations: The grail and the gremlins. Journal of Applied Research in Memory and Cognition, 12(2), 159–175. https://doi.org/10.1037/mac0000106

Drew

(personal communication, November 21, 2023).

Eich

(2014). Business not as usual. Psychological Science, 25(1), 3–6. https://doi.org/10.1177/0956797613512465

Henrich

Heine

S. J.

Norenzayan

(2010). The weirdest people in the world? Behavior and Brain Sciences, 33, 61–83. https://doi.org/10.1017/S0140525X0999152X

Lindsay

D. S.

(2019). Swan song editorial. Psychological Science, 30(12), 1669–1673. https://doi.org/10.1177/0956797619893653

Markus

H. R.

Kitayama

(2010). Cultures and selves: A cycle of mutual constitution. Perspectives on Psychological Science, 5(4), 420–430. https://doi.org/10.1177/1745691610375557

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), 943 and 4716.1–4716.8. http://doi.org/10.1126/science.aac4716

West

K. L.

Soska

K. C.

Cole

W. G.

Han

Hoch

J. E.

Hospodar

C. M.

Kaplan

B. E.

(2022). From description to generalization, or there and back again. Behavioral and Brain Sciences, 45, Article e37, e1: 67–69. http://doi.org/10.1017/S0140525X21000522