Abstract
Since the replication crisis came to widespread attention in psychology, there has been significant progress in reforming research practices to be more open and reproducible. However, the integration of these principles into teaching—particularly assessment—has lagged behind. Although many open educational resources have been developed to support the teaching of open science, fewer efforts have focused on embedding open science into how students are assessed. In this commentary, we address that gap by offering a series of practical, scalable strategies for integrating open and reproducible science into psychology education through assessment. We argue that to normalize open science, students must not only learn about its principles but also be assessed on their understanding and application of them. Drawing on examples from our undergraduate and postgraduate programs and the wider literature, we outline a range of assessment strategies aligned with curriculum standards and pedagogical evidence. These include incorporating preregistration and registered reports, evaluating reproducibility through code and data submission, engaging students in peer review and code review, and integrating open-science concepts into essays, multiple-choice exams, and final dissertations. We highlight that even small changes at the course level can promote open science and that educators should approach implementation flexibly, recognizing it as a continuum rather than a binary shift. We also stress the need to avoid framing open science as overly technical or inaccessible, which may discourage student engagement. By embedding open and reproducible practices into assessment design, educators can support the development of critical, ethical, and transparent future scientists.
Keywords
Since the replication crisis in psychology came to popular attention (Open Science Collaboration, 2015), much work has been done to improve the research practices of psychological scientists to be more open and reproducible (e.g., Alessandroni & Byers-Heinlein, 2022). Work on integrating these practices into the teaching of psychology has been slower but has gained ground in recent years (Pownall, Azevedo, et al., 2023). For example, the Framework for Open and Reproducible Research Training (FORRT; 2025) has developed resources to provide a pedagogical infrastructure and support the teaching and mentoring of open and reproducible science. These resources include a range of lesson plans, teaching materials, and activities made available as open educational resources (OERs; Pownall et al., 2024).
However, despite the wealth of OERs available for teaching material and content, there is currently much less written about how open-science practices can be incorporated into how educators assess students, not just what they teach them. Students are assessment focused (Kusurkar et al., 2023), a trait that is partially driven by the nature of modern higher education in which many face high assessment loads while being time-poor because of the need to undertake paid employment (Hill et al., 2024). In addition, there is also now greater recognition that assessment is for learning rather than of learning. Therefore, if educators consider open and reproducible science important, it is not enough to teach students about it; they must also assess their open-science knowledge and skills and ensure that what they do aligns with what they say. For example, there is often a mismatch in that academics talk about the benefits of preregistration and registered reports but ask their students to write a report following the traditional publishing workflow and structure.
A programmatic approach to integrating open science into the curriculum is arguably the most effective from a strategic perspective. However, this takes strong leadership with the vision to recognize and implement the changes needed, departmental buy-in and upskilling, and time and resources. In the School of Psychology and Neuroscience at the University of Glasgow, we transformed our curriculum across all levels and programs to ensure open science was at the core of all our practice; for example, we teach all students reproducible data skills using R (PsyTeachR, 2025). Despite the benefits of a programmatic approach, our key message in this article is that even small changes at course and assignment level still progress the aim of advancing open and reproducible science as the standard. Just like the adoption of open-science practices in research, their implementation in teaching should be viewed as a continuum (McKiernan et al., 2016) or a buffet (Bergmann, 2023) of choices, depending on one’s resources and agency as an educator.
For this commentary, we use “open science” as an umbrella term for practices that increase the transparency and reproducibility of research, including preregistration and registered reports, data and code sharing, and replication-focused designs. Our emphasis is on assessment strategies that foster these aspects of open and reproducible practice rather than on other important dimensions, such as open-access publishing or citizen science. We provide pragmatic advice and concrete examples to help educators embed the principles of open and reproducible science into how students are assessed. Although our focus is on the teaching of psychology, in line with our direct experience, the suggestions are adaptable to any data-based science program. To accompany the commentary, a range of example assessments with briefs and rubrics are available at https://osf.io/hzptj/.
Assessing Knowledge and Critical Thinking About Open Science as an Integrated Part of the Curriculum
Specific modules/individual courses in a program that focus on open science and research integrity allow students to engage deeply with the issues, and there are many OER syllabi available (e.g., FORRT, 2025). However, we argue that for open science and reproducibility to become an accepted scientific standard, students must develop a conceptual understanding of open science, its rationale, and current debates as an integrated part of their curriculum and assessment rather than this knowledge being siloed into single modules. This ethos aligns with curriculum standards proposed by the American Psychological Association (APA; 2023), the British Psychological Society (BPS; 2024), and the International Collaboration on Undergraduate Psychology Outcomes (Nolan et al., 2024), all of whom have included teaching about open science and reproducible practices in their updated standards and expectations for undergraduate psychology programs. That is, the most effective way of making open and reproducible science the norm is by teaching reproducibility from day one (Dogucu, 2025) instead of reserving it for advanced courses later in a degree.
For example, in our introductory first-year undergraduate psychology class, we teach students about the replication crisis, questionable research practices, and scientific fraud as part of their core introductory research methods and history of psychology lectures and associated reading (e.g., Diener & Biswas-Diener, 2025). Importantly, questions on these topics are included as part of their standard end-of-semester multiple-choice exam. Consequently, acquiring knowledge about open science is awarded the same status as acquiring knowledge about any other aspect of psychology.
In a similar vein, introductory modules or modules that cover conceptual and historical issues in psychology that are assessed by an assignment such as an essay, literature review, or presentation (Heilmayr, 2024) can be used to promote deeper engagement. Topics can include the history of the replication crisis, the role of transparency in science, or the merits and limitations of registered reports. The high-profile and sensationalist nature of some cases, such as Dan Ariely and Francesca Gino (e.g., Lewis-Kraus, 2023; Stern, 2023), and the public-health and funding implications of Brian Wansink (e.g., Oransky, 2018) and Andrew Wakefield (e.g., Deer, 2011) also present the opportunity for topics that many students find engaging to research and write about. It is critically important that as educators we do not prime students, consciously or unconsciously, that learning about open science and reproducibility is too technical or boring.
Finally, for higher-order assessments that are typically completed by honors level or postgraduate students, such as a critical review of published research, in addition to critiquing a study based on theory or standard methodological details, students can be asked to assess whether an article follows open-science principles, for example, whether the article has open data and code or is reproducible. In this way, rather than making the assessment all about open science, the application of the conceptual knowledge students have acquired at lower levels (e.g., through their multiple-choice exam) can be integrated and applied as part of an assignment assessing their broader analytical skills.
Assess the Scientific Process and Transparency
Developing skills related to open and reproducible research is often reserved for research methods and statistics courses, typically built around training students to write a traditional research report. Students have access to the data as they write up an introduction, method, results, and discussion to describe their study. However, like published research, this focuses on the outcome rather than the process, and students may be tempted to fit their hypothesis to their data. Initiatives such as preregistration and registered reports (Kathawalla et al., 2021) have emphasized valuing rigor over outcome, and using these formats as assessment types can help reinforce and reward the scientific process and transparency.
Implementing preregistration as an assessment can draw on templates from the OSF (2026) that may be adapted to suit the level of one’s students, and the specifications of the assessment and both quantitative and qualitative templates are available. Preregistration can be implemented as the full assessment, but if students are required to submit a complete research report, they can also submit a preregistration as a midpoint exercise. Previously, in our MSc Psychology Conversion program, students submitted a group preregistration in Week 7 to outline their research question and key methods and then individually submitted a traditional full report once they had access to the data set, following feedback on the preregistration submission. Likewise, in disciplines in which the final dissertation (also known as an undergraduate thesis or capstone project) includes an independent research project, students may submit a preregistration informally to their supervisor for formative feedback or as a lower-weighted component instead of a proposal.
Going beyond preregistration, students can also be asked to complete a registered report instead of a traditional research report. When students are learning how to write research reports, they do not need to repeatedly write full reports, and instead, slowly building up each stage can help scaffold the research process. In our first-year undergraduate program, students in their second semester write a Stage 1 registered report to outline an evidence-based research question, hypothesis, and rationale in addition to a method section for a study designed to address that research question. Students do not learn inferential statistics until second year to prioritize the development of foundational data skills and the ability to construct an evidence-informed research question over p values and statistical significance. Consequently, we constrain the technical complexity of the method section and analysis plan to match their current skills. At this stage, students specify their primary and secondary hypotheses in plain language, define the key variables and inclusion or exclusion criteria, describe the planned design and procedure, and indicate the descriptive summaries and figures they expect to produce without committing to specific inferential models or formal power analyses. This structure introduces the central principles of registered reports, particularly planning analyses in advance of seeing the data and valuing rigor over outcome while keeping the statistical demands appropriate for early stage learners. Likewise, in the second semester of our MSc Psychology Conversion program, depending on their mode of study, students prepare a qualitative research proposal (face-to-face program) or a portfolio (online distance-learning program) as their group submission. For the proposal, this includes justifying the methodological approach and sampling decisions, and the portfolio includes creating a focus-group schedule with six questions. Although both formats promote transparency in the planning and design of qualitative research, the two cohorts engage with data differently in the second part of the assessment: Face-to-face students collect primary data for their individual report, and online students are provided with secondary data to analyze.
The importance of the scientific process can also be reinforced through assessments that ask students to peer review a preprint or published article. Our school has previously used peer review of preprints to develop critical-evaluation skills by outlining the academic peer-review process and format and getting students to complete a review of a curated list of preprints (McAleer et al., 2018). The use of preprints is important because students often have the impression that published research is beyond criticism. Relatedly, the importance of curation will depend on one’s target cohort because more introductory courses will need to carefully choose the articles to match students’ current knowledge and understanding, whereas more advanced courses could provide more freedom and autonomy for students to select their own preprint. Alternatively, Boyle et al. (2023) suggested using public preregistrations as an opportunity for students to demonstrate their critical-evaluation skills. Students can be asked to write an open-essay-style peer review or for more structure, evaluate the source using a rubric modeled on the process of a specific journal.
Finally, as a specialized form of peer review, students can conduct code reviews in which programming is included as a taught skill in the curriculum. This could be a specific graded assignment, or it could be a formative opportunity in which students review each other’s code. For example, in our MSc Psychology Conversion program, we invite students in the final week of the course to review each other’s R code they have been developing for their Stage 2 research report (Bartlett et al., 2025). This is a formative activity in which they complete a checklist with items such as whether the code runs and whether the code is readable. Alternatively, this could be a summative assignment in which students evaluate code prepared by the lecturer or from a curated selection of publicly accessible code. As with the peer-review task, depending on their level and autonomy, students can be asked to complete an open review or a more structured rubric.
Reproducibility and Verification as Assessable Skills
Reproducibility and verification are core principles of open science (Hardwicke et al., 2018). To train the new generation of researchers, it is vital to ensure that students not only are exposed to such practices from the start of their training but also understand that they are integral to doing science better. Incorporating reproducibility practices into assessments makes it a requirement (Dogucu, 2025) rather than a wish and helps frame the concepts as a fundamental part of the research process. For example, in our curriculum, we introduce reproducible data-skills training using R (Nordmann, 2025) in the first year of the undergraduate degree. We continue to build on these skills throughout all 4 years, providing students with a scaffolded approach to develop and refine these skills through repeated exposure to reproducible workflows.
A relatively straightforward option to integrate reproducibility is by requiring students to submit their code and data alongside any written empirical-research report. This practice enforces good research habits but also ensures that their results are reproducible and verifiable. In environments such as R, this can be achieved through R Markdown (.Rmd) or Quarto (.Qmd) files, in which students can integrate code, output, and write-up into a single reproducible document (Baumer et al., 2014). For point-and-click software, such as SPSS, the submission could include syntax files or detailed procedural printouts. Reproducibility can be explicitly assessed by checking whether the document “knits” without errors, whether the output matches the values reported in the written report, and how clearly the code is commented.
Beyond quantitative reports, systematic reviews allow for students to submit their search protocol in the form of a PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) statement (Page et al., 2021) that details transparently why the review was done and the exact methodology undertaken. Whether replication and reproducibility is epistemologically, ontologically, or methodologically compatible with the goals of qualitative research is the subject of much debate (Pownall, 2024), but in cases in which it aligns with such considerations, for example, in projects that undertake a codebook analysis, students can be asked to submit, for example, their nVivo codebook export. These criteria can be built into the marking rubric or intended learning outcomes that help support students’ understanding that transparency and reproducibility are key to analytical competence.
A second opportunity to assess reproducibility is through the use of verification reports that aim to reproduce the results as described in a publication using open data (e.g., “Author Guidelines for Verification Reports,” n.d.). In our graduate curriculum, students complete an MSc Statistics and Research Design course in which students are presented with an extract from a published article and open data and tasked with reverse-engineering the analyses to reproduce the extract (Bartlett, 2025). This assessment can be adapted to the capabilities and level of the student cohort by selecting extracts depending on the data-analysis skills they have currently developed.
A third way to assess reproducibility is by evaluating students’ data-handling and data-wrangling skills. As part of our first-year (Nordmann, 2025) and second-year (Mahrholz & Kuepper-Tetzel, 2025) data-skills training, students complete low-weighted assessments throughout the year. We provide students with an instructions file and ask them to complete a range of tasks using an .Rmd template. For example, we give students a summary table or figure, and they are tasked to recreate it from raw or partially processed data. We mark these assessments using computer-assisted marking via the assessr package (Barr, 2024) by knitting their submitted .Rmd files to assess reproducible data skills at scale; class sizes range from 150 to 700. This approach allows instructors to assess students’ ability to identify underlying data structures, implement reproducible workflows, and critically think about how variables are prepared for analysis or visualization. These repeated low-weighted data-skills assessments reinforce reproducible research practices and provide the opportunity to tailor them to the student cohort by creating the tasks and data that are relevant for them to complete.
Dissertations and Project-Based Open Science
For data-based science programs, students often complete an independent research “capstone” project that they write up as a dissertation/thesis. This is typically the final stage in the learning journey and requires students to demonstrate all the skills they have developed across the program. The dissertation therefore provides a key opportunity to assess students on reproducible research practices and principles of open science.
In the development stages of a dissertation, students often complete a formative or summative proposal to outline their idea and get feedback from their supervisor. Instead of a proposal that is not directly included in the final dissertation, students can complete a preregistration (Blincoe & Buchert, 2019) or write a Stage 1 registered report before they start data collection or data analysis (see above). Pownall, Pennington, et al. (2023) showed that completing a preregistration in preparation for a dissertation increases a student’s understanding of open-science concepts. These skills can be reflected in the assessment criteria through professional skills, recognizing the planning, design, and ethical considerations independent of the final written dissertation.
For the topic of the dissertation, students can use primary or secondary data and complete the steps appropriate for each format, for example, using a preregistration template specific for primary data that the student will collect or contribute to or a template aimed at secondary data to focus on the analysis plan. Secondary data provide students with the opportunity to address a novel research question with data that already exist, such as reusing data from an article’s OSF page or alternative data repository. This has the benefit of avoiding research waste if data already exist to address a student’s research question (Creaven et al., 2023). Regardless of whether students use primary or secondary data, the dissertation assesses their subject and research-methods knowledge.
For primary data, direct replications also provide an opportunity to conduct an independent research project on an undervalued type of project. Clarke et al. (2024) estimated that just 0.2% of articles in top-ranking journals are direct replications. There are initiatives, such as the Collaborative Replications and Education Project (CREP; Wagge et al., 2019), that collate suitable articles to replicate and maximize the value of a project by engaging in team science and contributing data to a wider pool (Creaven et al., 2023). A direct replication is assessed the same as any project that collects primary data, but it has the added benefit of reinforcing principles of open science to value replication studies. This also does not need to wait until the student’s dissertation; one can reinforce the value of replication studies when teaching students about research reports in earlier courses. For example, our third-year undergraduates complete a quantitative research project in which a lecturer supports a class of 30 students to collectively perform a direct replication from data collection to write-up.
At the end of the dissertation process, students can demonstrate open-science principles by submitting their data and code to their supervisor or making them publicly accessible. In the absence of team-science approaches, dissertations are often limited because of the time and resources available to a single student. In our department, some supervisors organize students into groups and collect data for a larger project to pool resources, but each student addresses the student’s own specific research question to write up for the student’s dissertation. In isolation, a dissertation may not be publishable, but by openly sharing data, individual students can build on the project in future or contribute as teams of researchers, like CREP, to create an overall more informative study.
Summary
In this commentary, we provide concrete examples of how open and reproducible science can be advanced through integrating open principles into the assessment design. Our aim was to inspire assessment redesign at course and program level regardless of whether one is recently taking over a course or just wants to refresh one’s assessment approach away from repeated research reports. Even if educators have control over only a single course, they can adapt these examples to the needs and expectations of their students. Departments may also articulate explicit program-level learning outcomes for open and reproducible science and evaluate these longitudinally through premeasures and postmeasures, such as the Open Science Concept Inventory (Markant & Galati, 2023).
As noted, there is little research on the effects of specific open-science-aligned assessments, and future work should examine their impact on students’ learning and longer-term engagement with transparent research practices. However, although such work would strengthen the evidence base, it is not a prerequisite for implementation because alignment with contemporary standards for open and reproducible science already provides a strong pedagogical rationale for their adoption.
Finally, although we advocate a flexible “buffet” approach to implementation, we do not view open-science training as a matter of personal preference. Drawing on recent accreditation and guideline documents (APA, 2023; BPS, 2024; Nolan et al., 2024), we suggest that accredited psychology program should include a summatively assessed, fully reproducible analysis workflow and require students to engage with open-science principles in their capstone project or dissertation through preregistration or transparent sharing of materials, data, and code. These elements provide a baseline that is consistent with existing standards while leaving room for programs to tailor additional open-science teaching and assessment to their local context.
Footnotes
Transparency
Action Editor: David A. Sbarra
Editor: David A. Sbarra
Author Contributions
