Sage Journals: Discover world-class research

Abstract

Background

Research is computationally reproducible when independent analysts can use the underlying data to reproduce the original results. Some research in psychology is computationally reproducible, although much is not.

Objective

To assess the computational reproducibility of research published in Teaching of Psychology (ToP), Psychology Learning and Teaching (PLaT), and Scholarship of Teaching and Learning in Psychology (SoTL-P).

Method

We identified key claims in 60 papers published with open data in ToP, PLaT, and SoTL-P between 2017 and 2025. We then sought to reproduce 101 results supporting these claims.

Results

We exactly reproduced 73 of the 101 results. For a further 23 results, the substantive interpretations of our results matched those of the original researchers. The majority of the reproductions were performed relatively quickly by a psychology undergraduate with 2 years of relevant experience.

Conclusion

Psychology learning and teaching research appears more computationally reproducible than research in other psychology sub-fields. However, reproducibility is undermined by several factors, including a lack of open data and code.

Teaching Implications

Computational reproducibility activities can be incorporated into undergraduate research methods classes, effectively address many research methods learning outcomes, and help students develop both subject-specific and transferable employability skills.

Keywords

analytic reproducibility open science statistics education research methods employability meta-science meta-psychology

Research is computationally reproducible when independent analysts can use the underlying data to reproduce the original results. Computational reproducibility relies on the availability of raw data and plays a central role in research transparency. Limited empirical data suggest that some psychology research is computationally reproducible, although much is not.

For example, Hardwicke et al. (2018) attempted to reproduce a “subset of ‘relatively straightforward and substantive’” (p. 9) results in 35 papers with “in principle reuseable” (p. 6) data published in Cognition between 2014 and 2017. They independently reproduced every target result in 31% of these papers. After receiving assistance from the original researchers, their success rate rose to 63%. The remaining 37% of the papers contained results that Hardwick and colleagues could not reproduce. When Hardwicke et al. (2021) applied the same methods to 25 papers published in Psychological Science in 2014 and 2015, they achieved similar outcomes. Specifically, in 36% and 60% of the papers, the target results were fully reproducible before and after the original researchers’ assistance, respectively. In both studies, the discrepancies observed between original and reproduced results did not affect the main conclusions stated in the affected papers. However, in the 2021 study, Hardwicke and colleagues noted three “decision errors” (e.g., original and reproduced p-values on different sides of a stated or presumed alpha level).

More recently, Crüwell et al. (2023) sought to reproduce all the results in all 14 research papers published in the April 2018 issue of Psychological Science. Predictably, when the threshold for full reproducibility was raised (i.e., from some to all results), the number of papers meeting this threshold dropped. Only one of the 14 papers was “exactly reproducible,” while another three were “essentially reproducible” (p. 513; i.e., containing only minor deviations from the reproduced results). Crüwell and colleagues attributed the partial and nonreproducibility to several factors, including insufficient documentation, typographical or copy-and-paste errors, unclear reporting, and the absence of data-wrangling or analysis code. While the availability of code should make reproduction easier, it does not guarantee success. Obels et al. (2020) illustrated this when they were able to reproduce the main results in just 58% of 36 registered reports published with both data and code. Finally, and more positively, Artner et al. (2021) successfully reproduced 163 (70%) of 232 “key statistical claims” (p. 527) in 46 papers published in Psychology and Aging, the Journal of Abnormal Psychology, and Experimental and Clinical Pharmacology. However, 18 of these statistical claims became reproducible only when Artner and colleagues deviated from the original researchers’ descriptions of their analyses. Additionally, 7% of the original statistically significant results became nonsignificant on reproduction.

In the current study, we assessed the computational reproducibility of research published in Teaching of Psychology (ToP), Psychology Learning and Teaching (PLaT), and Scholarship of Teaching and Learning in Psychology (SoTL-P). The rationale for this was threefold. First, nothing is currently known about the computational reproducibility of psychology learning and teaching research. Second, researchers currently working in this field are often responsible for training the researchers of tomorrow. Assuming they are following the guidelines published by accrediting bodies such as the American Psychological Association (2023) and the British Psychological Society (BPS, 2024), reproducibility should be a topic in their syllabi. If their research results are mostly computationally reproducible, the present study will speak to their legitimacy as scientist-practitioner-educators. Where their work is lacking, we can offer suggestions for improvement that should filter into their teaching. Third, computational reproducibility activities present excellent learning opportunities for students (see Allen et al., 2025b, for a structured example). ToP, PLaT, and SoTL-P are potentially rich sources of open data for such activities. This is because (a) they tend to publish research based on undergraduate-level statistics; and (b) in re-analyzing open learning and teaching datasets, students can learn about effective study practices alongside learning about open science, research methods, and statistics. An understanding of the computational reproducibility of research published in these journals can help educators develop and frame such activities. Finally, beyond this rationale, we wanted to reflect on the benefits and challenges of introducing computational reproducibility activities into the undergraduate classroom.

Method

Design

We conducted this preregistered (https://osf.io/efhnp) observational study in two phases. In Phase 1, we identified key claims in 60 empirical papers published with open data in ToP (n = 27), PLaT (n = 16), and SoTL-P (n = 17) between 2017 and 2025. In Phase 2, we sought to reproduce results supporting these claims. The School of Psychological Science Research Ethics Committee at the University of Bristol approved this study prior to its commencement (reference: 26628).

Sample

On May 29, 2025, we extracted a list of 1,039 empirical papers published since January 1, 2015, in ToP, PLaT, and SoTL-P. We then rapidly screened these papers for data availability statements indicating that the data underpinning the research were “open” (i.e., available without restrictions). This screening process generated a list of 103 papers reporting results that were potentially computationally reproducible. From these, we randomly sampled 60 (with one replacement due to data in a proprietary format we were unable to open or convert), with publication dates ranging from early 2017 to mid-2025.

Positionality

AA conducted this study as an 8-week BPS-funded research assistantship prior to entering the third year of the BSc Psychology at the University of Bristol. PA supervised the assistantship and worked with AA on the design and execution of this study. PA was “hoping for the best but expecting the worst” regarding the reproducibility of a field he has spent over 10 years working in and advocating for. Although there remains room for improvement, PA was pleasantly surprised by the results of this study.

Procedure

Phase 1: Claim Identification

In replication and computational reproduction studies, various methods have been used to identify “central” or “key” claims in published research (e.g., Artner et al., 2021; Camerer et al., 2018). They are all somewhat subjective. We followed a protocol based on the SCORE Collaboration (2024). Specifically, AA developed a claim trace for each paper (see https://osf.io/52unq) comprised of four levels: title, statement from the abstract, prediction in the main text, and supporting result. When multiple results supported the claim (n = 41), the result with the largest effect size was selected. This is illustrated for Kelly et al. (2025) in Table 1.

Table 1.
Claim Trace for Kelly et al. (2025).

Level Extract from Paper

1. Title “Who gets remembered? Disparities in recognition of early eminent psychologists…” (p. 1)

2. Abstract “Recognition significantly benefited early eminent White male psychologists compared to women and psychologists of color.” (p. 1)

3. Prediction “… recognition rates are expected to be highest for early eminent White men compared to other groups… (p. 4)

4a. Possible results “White men … had significantly higher recognition scores than men of color…, z = 12.36, p < .001, r = .47, and women of color…, z = 12.36, p < .001, r = .43.” (p. 7)

4b. Selected result “z = 12.36, p < .001, r = .47.” (p. 7)

After developing each claim trace, AA prompted ChatGPT (GPT model 4o) to repeat this process. If the two claim traces matched (n = 45), the selected result was progressed to Phase 2. If there was a mismatch, PA independently developed a claim trace. If this matched either of the previous claim traces (n = 7), the matched result was progressed to Phase 2. If all three differed (n = 8), we asked a colleague to arbitrate by selecting between the three claim traces. They did this without knowing who (or what) had produced each claim trace. Their selection was then progressed to Phase 2.

In the 41 instances where there were multiple results supporting the claim, a second result was randomly selected from among these for reproduction. For example, in Kelly et al. (2025), the second selected result was “z = 12.36, p < .001, r = .43” (p. 7).

Phase 2: Attempted Reproduction

If the code was available, AA executed it. If this produced errors that could not be resolved through troubleshooting and/or the results did not match those in the corresponding paper, AA sought advice from PA, and the outcomes were recorded. If the code was unavailable and the relevant statistics had been taught in the first 2 years of the BSc Psychology, AA attempted a manual reproduction in SPSS or JASP. When reproduction was unsuccessful, AA first sought advice from ChatGPT and then, if necessary, from PA before the outcomes were recorded. If the relevant statistics were unfamiliar to AA, he first consulted relevant instructional material on YouTube, then, if necessary, Coolican (2024), followed by ChatGPT, and finally PA. The outcomes following these processes are documented at https://osf.io/vqxpj. We coded a result as “reproducible” when there was an exact match (after rounding, where applicable) between the values we calculated and those in the original publication. If there were any differences, the result was coded as “not reproducible.”

It should be noted that, despite our initial cynicism, we made all reproduction attempts in “good faith.” Successful reproduction was approached as a challenge that we were motivated to overcome. Some reproductions were achieved quickly (e.g., < 15 min), whereas others took considerable time (e.g., 3–4 hr before either achieving success or reluctantly giving up). Finally, to prevent possible fixation on incorrect solutions, PA's attempts at reproduction were made blind to the methods AA had tried or the results he had achieved.

Deviations From Preregistration

First, when developing the claim traces, we observed that a single claim was often supported by multiple results. In these situations, we first selected the result with the largest effect size, and then randomly selected a second result supporting the same claim. Thus, for 41 of the 60 papers, we sought to reproduce two results. Second, we relaxed the time limits we had pre-registered for troubleshooting code and for each reproduction attempt. This was to feel comfortable that any reproduction failures were likely due to factors other than insufficient time. Third, in all situations where executing code did not lead to reproduction success for AA, PA was consulted. This is consistent with the approach taken for manual reproduction. Finally, rather than using the documentation for AA's unsuccessful reproduction attempts as a starting point for his own efforts, PA's reproduction attempts were made blind to AA's.

Results

Across all 60 studies, 43 (71.7%) of the initially selected results were reproducible. Of these, the majority (30 results) were achieved by AA with nothing more than his undergraduate lecture notes and some perseverance. YouTube (five results), ChatGPT (one result), and finally PA (seven results) were also useful when the lecture notes were lacking. For 41 studies, we also attempted to reproduce a second result. Of those attempts, 30 (73.2%) were successful (26 by AA, followed by an additional four by PA). In 27 (65.9%) of the 41 studies, we were able to reproduce both results, and in 7 (17.1%), we were unable to reproduce either result.

Despite our best efforts, we were unable to exactly reproduce 28 (27.7%) of the results we examined. We categorized these results as not reproducible due several reasons, including (a) probable rounding issues (six results); (b) the use of incorrect degrees of freedom (two results); (c) probable copy-and-paste errors (two results); (d) numerical differences between our results and those in the published papers which were usually modest, and which we were unable to explain (15 results); (e) unspecified exclusion criteria (two results); and (f) a dataset that we were unable to make sense of (one result). For results affected by (a), (b), and (d), there were no substantive differences between the interpretation of our results and those reported by the original authors. In the study affected by (c), the original authors’ descriptions of their results were consistent with ours; however, the statistics they reported were not. They corresponded to different effects within the same overarching analysis.

Discussion

The present study aimed to assess the computational reproducibility of research published in ToP, PLaT, and SoTL-P. Of the 101 results in the 60 papers we focused on, 73 (72.3%) were exactly reproducible. For many of the others, the differences between our results and those in the original publications were negligible. Indeed, if we lowered the threshold for “reproducible” to include results that differed due to rounding issues or incorrect degrees of freedom, our reproduction success rate would rise to over 80%. We suspect it would rise even further if we engaged with the authors of the papers containing results we were unable to reproduce. When Hardwicke et al. (2018, 2021) did this, their success rate doubled.

On the surface, these results are pleasing. For context, they are comparable with those of Artner et al. (2021), who successfully replicated 70% of “key statistical claims” (p. 527) in 46 papers published in four psychology journals. Furthermore, our findings are considerably more positive than those of Hardwicke et al. (2018, 2021), Crüwell et al. (2023), and Obels et al. (2020), who used a range of methods to assess computational reproducibility and achieved success rates of no higher than 63%. However, there is still room for improvement. Only 10% of the papers we initially screened were accompanied by open data, which provided us with the opportunity to assess their computational reproducibility. It is reasonable to speculate that this 10% may be more reproducible than the 90% that were not open to such scrutiny. Although we understand that open data is not always possible, we find it difficult to believe that genuine restrictions (e.g., ethics, commercial sensitivities, national security, etc.) prevented even close to 90% of the authorship teams from preparing and making publicly available deidentified versions of their data files before publication. There are several free services available to support this (most notably, the Open Science Framework; https://osf.io), and many universities also maintain institutional data repositories. Furthermore, as we know from previous research (Gabelica et al., 2022), “available on request” most commonly means “not available.” Thus, the primary factor limiting the computational reproducibility of psychology learning and teaching research is a lack of open data.

Among papers for which data are available, there is no reason why the computational reproducibility rate should not be 100%. In our study, it was not. Factors that made successful (and attempted but ultimately unsuccessful) reproduction unnecessarily difficult included the absence of code; buggy and/or incomplete code; the absence of a codebook or intuitively named variables and (where applicable) value labels; vague or incompletely described data wrangling and analyses; and datasets in languages other than the language of the associated paper. Thus, we recommend that psychology learning and teaching researchers provide complete, verified, and reproducible code and a codebook; give variables meaningful names and labels that correspond to those used in the paper; and if not providing code, ensure that the data wrangling and analyses, as well as the calculations for any statistics for which there are multiple accepted versions (e.g., some measures of effect size) are completely and precisely described (in an online supplement if necessary). Furthermore, one dataset in our initial sample needed replacement because it was in a proprietary format we were unable to read. Therefore, we further recommend that psychology learning and teaching researchers convert data to an open format (e.g., CSV) and ensure it is readable by common data analysis packages prior to upload. Many of these recommendations are also reflected in the FAIR principles (see Wilkinson et al., 2016), which should be incorporated into the editorial policies of all psychology learning and teaching journals.

Implications for Teaching Reproducibility

This project was completed as a BPS-funded undergraduate summer research assistantship. Consistent with this, one of our aims was to reflect on the pedagogical potential and challenges of integrating authentic computational reproducibility activities into undergraduate syllabi. Such activities align with several learning outcomes specified by the BPS (2024), including critical thinking, managing and analyzing different types of quantitative data, using specialist software, and awareness of open science and reproducibility issues in research. When complemented with exercises like those developed by McCarley and Soicher (2022), computational reproducibility activities give students hands-on experience with the full range of benefits and challenges associated with reproducibility as both producers and consumers of research. They give students opportunities to explore what “best practice” can look like by observing where published research falls short, which they will carry forward into subsequent studies and, where applicable, capstone projects. As many of today's psychology students will be tomorrow's psychological scientists, such opportunities are vital to improving the future quality of psychological science. Furthermore, computational reproducibility activities help reinforce the point that mistakes are commonplace, even among experienced researchers (see Breznau et al., 2025, for a vivid example in the context of computational reproducibility research), and that a clear audit trail is vital for identifying and correcting these. Alongside multiverse studies (see Allen et al., 2025a), they also provide opportunities to discuss analytic alternatives and their implications. We did not attempt to judge the appropriateness of the data processing techniques and statistics used in the 60 studies, but within a classroom context, computational reproducibility activities afford students detailed knowledge of what researchers did, which can naturally lead to conversations about what they could have done. Resources like Allen and Fielding (2025) could help inform these conversations. Finally, computational reproducibility activities can be situated within any subject area. However, situating them within psychology learning and teaching research allows students to learn about effective study practices alongside learning about open science, research methods, and statistics. In our experience, the statistics used in this field of research are generally undergraduate-friendly and, on average, reproducible.

On average, it took AA less than an hour to reach a conclusion regarding each initially selected result, followed by an additional 5–10 min for the second result, where applicable. Using only notes developed during his first 2 years of the BSc Psychology, his reproduction success rate was 52%. This was increased to 61% with a modest amount of further research. Notably, PA, with over 25 years of relevant experience (as a student, teacher, and researcher), could only increase the overall success rate to 72%. This suggests that, with appropriate scaffolding, computational reproducibility activities can be successfully deployed in undergraduate research methods classes. We would recommend that educators seeking to do this start with one or two results based on recently taught statistics, which are known to be reproducible with a modest amount of effort. Then, as students’ confidence builds, they can be given results that require greater effort, along with results that are not reproducible. Finally, students can be challenged to reproduce results of personal interest that they source from the primary literature.

Challenges like this can bridge a pedagogic gap by bringing students closer to authentic research practices. For Bauer et al. (2023), they represent a “great opportunity” (p. 121) for students to experience the work of a practicing scientist. Of course, not all psychology students are training for a career in research. Computational reproducibility activities can also help students develop a range of transferable skills, which are valued in a wide range of industries. These include analytical thinking, adaptability, integrity, flexibility with new systems, and familiarity with a range of software (Naufel et al., 2019). Explicitly connecting these skills to classroom activities is known to improve student outcomes (Miller & Favelle, 2022).

Limitations and Future Research

In this study, we established two benchmarks for assessing the quality of psychology learning and teaching research. The first is the proportion of studies that are sufficiently open to permit assessment of their computational reproducibility, which is disappointingly low. The second is the proportion of that open research that is computationally reproducible, which appears encouragingly high, though still short of ideal. However, substantial gaps remain in our understanding of the computational reproducibility of this field.

First, in most cases, we avoided attributing differences between our results and those reported in the original papers to errors made by the original researchers. Despite our best efforts, some differences could reflect mistakes on our part. Furthermore, although we were able to identify likely causes for certain differences (e.g., rounding or incorrect degrees of freedom), many remained unexplained. Future researchers could address these uncertainties by seeking the assistance of the original authors. When Hardwicke et al. (2021) did this, their reproducibility success rate increased dramatically.

Second, identifying key claims was often more ambiguous than anticipated. In many instances, there were several reasonable possibilities that we could have pursued. Moreover, when multiple results supported a single claim, our decision to select the largest effect, followed by a second randomly chosen effect, was arbitrary. Future work could overcome these limitations by examining a larger set of claims and/or results within each study. Researchers might, for example, adopt the approach of Crüwell et al. (2023), who sought to reproduce all the results reported in a single issue of Psychological Science. This, along with coding other aspects of each study (e.g., design, authorship team composition, statistics used, etc.), would allow the identification of factors that predict computational reproducibility and highlight where researchers should be most vigilant. For instance, are results supporting key claims more reproducible than peripheral results? Does the experience of an authorship team influence the reproducibility of their results? Are some statistics or techniques more reproducible than others? Some of these factors (e.g., the complexity of the statistical techniques used) may also help to explain why our reproducibility success rate was higher than that observed in previous research.

Third, only 10% of the research papers we initially screened were accompanied by open data and therefore eligible for inclusion in our sample. Within the context of psychology learning and teaching research, future researchers should explore the extent to which data described as ‘available on request’ are in fact available, and whether the results reported in the papers accompanying these data are reproducible.

Finally, we have proposed that authentic computational reproducibility activities can be successfully integrated into undergraduate research methods classes and briefly outlined potential strategies for doing this. However, these strategies have not been tested. Future researchers should, therefore, explore how to best implement and evaluate the effectiveness of such activities. Particular attention should be paid to the experiences of students who are less enthusiastic about research methods and statistics and who receive less support than AA received in the present study.

Conclusion

In this study, we were able to exactly reproduce 73 (72.3%) of 101 results reported in 60 papers published in ToP, PLaT, and SoTL-P between 2017 and 2025. An undergraduate student achieved the majority of these reproductions with access to the range of resources and experiences typical for someone approximately midway through an undergraduate degree in psychology. These results demonstrate two main points. First, psychology learning and teaching research is comparatively computationally reproducible, although there is clearly still room for improvement. We have provided several recommendations for researchers in this field to help them improve the reproducibility of their future work. Second, computational reproducibility activities can effectively address many undergraduate research methods and statistics learning outcomes and can help students develop both subject-specific and transferable employability skills. We have outlined these skills and suggested strategies that educators can explore for introducing authentic computational reproducibility activities into their undergraduate research methods syllabi.

Level	Extract from Paper
1. Title	“Who gets remembered? Disparities in recognition of early eminent psychologists…” (p. 1)
2. Abstract	“Recognition significantly benefited early eminent White male psychologists compared to women and psychologists of color.” (p. 1)
3. Prediction	“… recognition rates are expected to be highest for early eminent White men compared to other groups… (p. 4)
4a. Possible results	“White men … had significantly higher recognition scores than men of color…, z = 12.36, p < .001, r = .47, and women of color…, z = 12.36, p < .001, r = .43.” (p. 7)
4b. Selected result	“z = 12.36, p < .001, r = .47.” (p. 7)

Footnotes

Acknowledgements

We would like to acknowledge Dr. Dorota Bednarek for her role in securing the funding for this research and her support throughout this project, Dr. Marin Dujmović for his assistance with the initial data extraction and screening processes, and the MSc students who helped with the initial screening. This research was funded through the British Psychological Society's Undergraduate Research Assistantship Scheme.

ORCID iDs

Ali Almuhanna

Peter J. Allen

Author Contributions

AA: conceptualisation, methodology, investigation, formal analysis, data curation, writing–original draft, and writing–review and editing. PA: conceptualisation, methodology, validation, writing–original draft, and writing–review and editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded through the British Psychological Society's Undergraduate Research Assistantship Scheme.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Open Practices

For publishing their material, Almuhanna and Allen received badges for open data, open materials and preregistered.

Transparency and Openness Statement

This study was preregistered at https://osf.io/efhnp. The raw data underpinning this research are available at .

References

Allen

P. J.

Abraham

Bleach

Hill

Mahmud

Wild

Krause

A. E.

(2025a). “Anyone could do this!”: Multiverse as an undergraduate psychology capstone project. PsyArXiv. https://doi.org/10.31234/osf.io/fhb6q_v1

Allen

P. J.

Fielding

J. L.

(2025). How to choose the right statistical analysis for your research aim or hypothesis. Sage Research Methods. https://doi.org/10.4135/9781036217372

Allen

P. J.

Madan

Basu

Rattan

Savani

(2025b). Learn how to run, interpret, and report a multiple regression in R using data from Madan et al. (2019). Sage Research Methods. https://doi.org/10.4135/9781036216498

American Psychological Association. (2023). APA guidelines for the undergraduate psychology major (version 3.0). http://www.apa.org/ed/precollege/undergrad/index.aspx

Artner

Verliefde

Steegen

Gomes

Traets

Tuerlinckx

Vanpaemel

(2021). The reproducibility of statistical results in psychological research: An investigation using unpublished raw data. Psychological Methods, 26(5), 527–546. https://doi.org/10.1037/met0000365

Bauer

Breznau

Gereke

Höffler

J. H.

Janz

Rahal

R.-M.

Rennstich

J. K.

Soiné

(2023). Teaching constructive replications in the behavioral and social sciences using quantitative data. Teaching of Psychology, 52(1), 117–123. https://doi.org/10.1177/00986283231219503

Breznau

Rinke

E. M.

Wuttke

Adem

Adriaans

Akdeniz

Alvarez-Benjumea

Andersen

H. K.

Auer

Azevedo

Bahnsen

Bai

Balzer

Bauer

P. C.

Bauer

Baumann

Baute

Benoit

Bernauer

Nguyen

H. H. V.

(2025). The reliability of replications: A study in computational reproductions. Royal Society Open Science, 12(3), Article 241038. https://doi.org/10.1098/rsos.241038

British Psychological Society. (2024). Standards for the accreditation of undergraduate, conversion and integrated Master’s programmes in psychology. https://www.bps.org.uk/accreditation/education-providers

Camerer

C. F.

Dreber

Holzmeister

T.-H.

Huber

Johannesson

Kirchler

Nave

Nosek

B. A.

Pfeiffer

Altmejd

Buttrick

Chan

Chen

Forsell

Gampa

Heikensten

Hummer

Imai

Isaksson

(2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644. https://doi.org/10.1038/s41562-018-0399-z

10.

Coolican

(2024). Research methods and statistics in psychology (8th ed.). Routledge.

11.

Crüwell

Apthorp

Baker

B. J.

Colling

Elson

Geiger

S. J.

Lobentanzer

Monéger

Patterson

Schwarzkopf

D. S.

Zaneva

Brown

N. J. L.

(2023). What’s in a badge? A computational reproducibility investigation of the open data badge policy in one issue of Psychological Science. Psychological Science, 34(4), 512–522. https://doi.org/10.1177/09567976221140828

12.

Gabelica

Bojčić

Puljak

(2022). Many researchers were not compliant with their published data sharing statement: A mixed-methods study. Journal of Clinical Epidemiology, 150, 33–41. https://doi.org/10.1016/j.jclinepi.2022.05.019

13.

Hardwicke

T. E.

Bohn

MacDonald

Hembacher

Nuijten

M. B.

Peloquin

B. N.

deMayo

B. E.

Long

Yoon

E. J.

Frank

M. C.

(2021). Analytic reproducibility in articles receiving open data badges at the journal Psychological Science: An observational study. Royal Society Open Science, 8(1), Article 201494. https://doi.org/10.1098/rsos.201494

14.

Hardwicke

T. E.

Mathur

M. B.

MacDonald

Nilsonne

Banks

G. C.

Kidwell

M. C.

Hofelich Mohr

Clayton

Yoon

E. J.

Henry Tessler

Lenne

R. L.

Altman

Long

Frank

M. C.

(2018). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition. Royal Society Open Science, 5(8), Article 180448. https://doi.org/10.1098/rsos.180448

15.

Kelly

A. E.

Emter

A. K.

Duncan

Laurin

J. N.

(2025). Who gets remembered? Disparities in recognition of early eminent psychologists and implications for belonging in psychology. Scholarship of Teaching and Learning in Psychology, Advance online publication. https://doi.org/10.1037/stl0000444

16.

McCarley

Soicher

(2022). Making research reproducible. Society for the Teaching of Psychology. https://teachpsych.org/page-1603066#stats

17.

Miller

L. M.

Favelle

(2022). Emphasizing transferable skills in undergraduate cognitive psychology is associated with higher grades. Teaching of Psychology, 51(3), 291–297. https://doi.org/10.1177/00986283221083867

18.

Naufel

K. Z.

Spencer

S. M.

Appleby

Richmond

A. S.

Rudman

Van Kirk

Young

Carducci

B. J.

Hettich

(2019). The skillful psychology student: How to empower students with workforce-ready skills by teaching psychology. Psychology Teacher Network, 29(1). https://www.apa.org/ed/precollege/ptn/2019/03/workforce-ready-skills

19.

Obels

Lakens

Coles

N. A.

Gottfried

Green

S. A.

(2020). Analysis of open data and computational reproducibility in registered reports in psychology. Advances in Methods and Practices in Psychological Science,3(2), 229–237. https://doi.org/10.1177/2515245920918872

20.

SCORE Collaboration. (2024, January 23). Systematizing confidence in open research and evidence (SCORE) (Version 4). https://doi.org/10.31235/osf.io/46mnb

21.

Wilkinson

M. D.

Dumontier

Aalbersberg

I. J.

Appleton

Axton

Baak

Blomberg

Boiten

J.-W.

da Silva Santos

L. B.

Bourne

P. E.

Bouwman

Brookes

A. J.

Clark

Crosas

Dillo

Dumon

Edmunds

Evelo

C. T.

Finkers

Mons

(2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3, Article 160018. https://doi.org/10.1038/sdata.2016.18

The Computational Reproducibility of Psychology Learning and Teaching Research

Abstract

Background

Objective

Method

Results

Conclusion

Teaching Implications

Keywords

Method

Design

Sample

Positionality

Procedure

Phase 1: Claim Identification

Phase 2: Attempted Reproduction

Deviations From Preregistration

Results

Discussion

Implications for Teaching Reproducibility

Limitations and Future Research

Conclusion

Footnotes

Acknowledgements

ORCID iDs

Author Contributions

Funding

Declaration of Conflicting Interests

Open Practices

Transparency and Openness Statement

References