Abstract
Computer simulation has become increasingly complex and widely applied across different domains. However, the reproducibility and replicability (R&R) of simulation models remain limited. Despite recent improvements, independent reproduction or replication of simulation experiments is rare in the literature. This paper provides an overview of the state of research on R&R in simulation, highlights recent developments, and discusses key concepts and future perspectives. It first examines how R&R has been viewed, approached, and evaluated and then outlines typical challenges and defining characteristics of R&R. Emerging opportunities are also discussed in light of community-driven practices, artificial intelligence, and quantum computing. Given the significant role of simulation in modern science, this paper argues that R&R studies of simulation are valuable research outputs and should be regarded as an integral and equally important part of scientific progress. R&R should be explicitly addressed and embedded into modelling and simulation practices, and supported by stronger community efforts. Researchers engaging in these efforts face substantial challenges, including those related to recognition and rewards, methodology, and scalability, many of which are under-researched.
1. Introduction
Many computer simulation models have been developed for science. It is widely recognised that simulation-based experiments, like other types of experiments, shall uphold reproducibility, a fundamental tenet of science.1–3 Although, in principle, research should be reproducible, there is a growing concern among scientists that only a limited fraction of published research can be reproduced, a situation sometimes called the “reproducibility crisis.”2–5 A nature survey 6 revealed that more than 70% of responding researchers have tried and failed to reproduce another scientist’s experiments, and more than half of them have failed to reproduce their own experiments. Protzko et al. 7 reported high replicability of novel social-behavioural findings, but the article itself was retracted due to methodological concerns. While many influential publications in different domains failed to be reproduced by independent researchers,8,9 non-reproducible publications seem to have been cited more often than reproducible ones, and this difference in citation does not change after the publication of the failure to reproduce or replicate.10,11 These findings raise alarms about the sustainability of science. Despite improvements, many challenges in reproducible research are persistent.12–15
In the field of modelling and simulation (M&S), reproducibility and replicability, abbreviated as R&R hereafter, is one grand challenge in both industry and academia. 16 Compared with that of traditional (laboratory or field) experiments, the R&R of simulation models is presumably even worse.17,18 Computer simulation is increasingly popular in many scientific disciplines. However, computer models and experiments, especially dynamic stochastic simulations, are rarely reproduced or replicated by independent researchers.16,19,20
A few R&R studies of simulation models exist. For example, Jalali et al. 21 assessed 1613 articles that applied simulation modelling as a core method in health policy and epidemiology. They found that almost half of the articles did not report model details. A more in-depth evaluation of 100 of those articles showed that seven out of 26 evaluation criteria were satisfied by more than 80% of those articles. Only about 2% of these articles provided modelling code and had reproducibility discussions. Zhang and Robinson 22 searched six prominent journals for articles focused on agent-based modelling (ABM). They found nine out of 348 resulting articles that aimed to replicate an existing model partially or entirely, indicating limited R&R studies. Riehl et al. 18 assessed more than 11K simulation studies from 15 most renowned transportation journals. About 5% of the studies provided some form of repositories, and most offered “content of a rather mediocre level of usefulness.” Bajracharya and Duboz 23 and Donkin et al. 24 showed that different implementations of the same conceptual model on different modelling platforms can give significantly different results. The studies underline the necessity of R&R studies in simulation for producing more reliable results and understanding the potential pitfalls in simulation-based studies. In general, recent studies have called attention to R&R, which is a significant challenge in simulation-based research. Many published models and experiments lack sufficient documentation or accessibility for independent verification and validation. 18 The challenges for R&R of simulation models appear to be more persistent than those for traditional experiments.17,25
With the overarching goal of forming stronger community efforts for reproducible and replicable (R&R) simulation models, this paper aims to articulate the values, benefits, and challenges associated, as identified through the existing body of literature. The paper is partly based on Huang. 26 In this extended version, we identify the characteristics for R&R research, opportunities, and call for sustained and systematic efforts towards advancing R&R practices in the M&S field via promoting research and collaboration on this critical topic.
The rest of the paper starts with a brief review of the terminology used for R&R. Researchers in different domains do not have to align the use, but should be well-aware of the differences so that relevant work can be discovered in the literature across domains and disciplines. It is followed by different views and opinions on R&R, types of evaluation, and recent developments. We outline the challenges, characteristics, and opportunities presented by emerging technologies in relation to the R&R of simulation-based research. The paper argues that, given the current complexity and widespread use of diverse simulation models, the R&R of simulation needs to be explicitly addressed operationally in existing modelling practices by researchers who wish to engage in these efforts, for which many social, conceptual, methodological, and scalability issues remain under-researched.
2. Reproducibility and replicability in simulation
There is no consensus in the scientific literature on what reproducibility is or should be.4,6 Closely related terms include replicability and repeatability, which have been used in different domains. They have long been used referring to the general concept of one experiment or study confirming the results of another by repeating the existing research in different ways. 27 However, within this general concept, the literature has not yet converged on a terminologically consistent conceptual framework. 27 Hence, the language of research reproducibility is non-standard. 4 Different scientific disciplines and institutions may use the terms in inconsistent or sometimes even contradictory ways; some use them interchangeably.2,6,22,27–29
Repeatability often implies that researchers can repeat the calculations of their own study and obtain the same results under the same setup.3,18,30 This paper will focus on R&R, namely the ability of other researchers to obtain consistent results when pursuing similar aims. (This “other researcher” may well be oneself in the future when the details of one’s own study become opaque without sufficient reporting and archiving.) In M&S, replications are also used referring to repeated runs of simulation models with different seeds of the random-number generators but otherwise the exact same model configurations – these runs are known as independent replications of simulation. 31 For clarity, in this paper, reproducibility narrowly refers to the “computational reproducibility” of simulation models. We make the following distinction of R&R in M&S based on NAPress 27 and Hinsen. 32
Reproducibility is obtaining identical results using the same input data, computational steps, methods, code, and conditions of simulation-based analysis.
Replicability is obtaining consistent results across simulation-based studies aimed at answering the same scientific question, each of which has obtained its own data and/or uses different code.
According to this distinction, simulation studies that use the same core conceptual model but have different computational implementations – as those reported (Zhang and Robinson, 22 Luijken et al., 28 and Edmonds and Hales 33 ) are replication studies, regardless if the studies used the same input data.
Modellers commonly distinguish model conceptualisation from model implementation. The former, or a conceptual model, is often a textual, mathematical and/or diagrammatic description of model characterisation and processes of interaction, based on a real system of modelling interest. 17 A conceptual model is traditionally often not executable; thus, it may have ambiguities in how to compute model inputs to outputs. 34 The latter, or model operationalisation, or simply a simulation model, is a computational formalisation of a conceptual model into an executable computer programme where numerical output can be produced by executing (i.e. running) the implemented model. 17
Reproducing or replicating simulation studies aims to demonstrate that a computer simulation experiment’s results are repeatable and were not exceptional cases.19,34 Without verifying the claimed results through R&R simulation experiments, it is possible that published findings were incorrect due to, for example, programming errors, mistakes in the reporting or analysis of results, or misrepresentation of the simulation experiment. 20 Consistent results from model reproductions or replications can build confidence in the simulation mechanism used. 35
3. R&R research: why and why not
There are different views and opinions on the need for R&R of simulation studies in diverse research areas, and due to personal preferences, which can be equally interesting or at least open to debate. We briefly discuss those in this section.
3.1. Practicality and workload
Reproducibility can be hard or practically impossible to achieve for certain research areas, such as military and critical infrastructure, and for sensitive topics in medicine and other industries due to business interests, security, privacy, or ethical reasons, among others. These simulation applications often require non-disclosure agreements or are demanded by laws or regulations that limit or prevent the publication of information or software that is necessary to reproduce the simulation.16,18,36
Making simulation (and results) more R&R demands dedicated, and sometimes extensive, time, and effort from the original researchers. According to a survey (N = 87) by Riehl et al., 18 major constraints include time, legal issues, lack of confidence to share material in the current state, lack of knowledge, and quality concerns. Many of those reasons also hold true for independent researchers or teams who aim to reproduce or replicate the work. As some put it, “publication is already a gruelling process, why would we increase our workload.” 36
Currently, there is often no direct reward or consequence for researchers who exert effort for more R&R simulation. For example, the publication impact analysis by Riehl et al. 18 showed that, while simulation studies received higher citations compared with studies that did not use simulations, when comparing simulation studies with and without repositories, with good or less good repository quality, no significant citation difference can be found among those studies. There have been few or, at the very least, insufficient incentives to undertake what researchers view as additional work to make their simulation study more open and accessible. 36
3.2. Concerns and benefits
There are critical voices regarding R&R research. For example, Drummond 37 raises concerns about the strong influence the reproducible research movement is having on which papers get published; in addition, widening the responsibilities of peer reviews adds extra workload to reviewers and does not recognise the broad role the scientific community plays (at the post-publication stage) in determining the value of an idea. Drummond 38 states that reproducible research in some fields also requires open source code, which is a narrow interpretation of how science works; the effort necessary to meet the aim and the general attitude it engenders would not serve well any of the research disciplines. Fanelli 39 in his (Proceedings of the National Academy of Sciences) article argues that the rapidly growing scientific literature uncritically endorses a new “science is in crisis” or “reproducibility crisis” narrative, which is not only empirically unsupported and unreliable but also quite counterproductive and might foster cynicism and indifference in younger generations. Leonelli40,41 advocated for scientific pluralism and provided paradoxical real-life examples where the principles of Open Science (OS) and reproducible research clashed with responsible research measures and practices. The author further argued that unless relevant policies embrace a more sophisticated understanding of epistemic diversity, they may risk acting as a reactionary force that reinforces conservatism and increases inequity among researchers, given differences in power, resources, and visibility.
Regardless of the views or where the truth lies, clearly, not all models need or can be made R&R. It is a value and insofar often a voluntary community service being placed on good science by some researchers. For them, including the authors of this paper, reproducing and/or replicating a computational model can contribute to the scientific community in many different ways that cumulatively consolidate science. For example, commonly discussed benefits in the literature include developing shared understanding, obtaining an improved sense of accuracy, reliability, robustness, and range of plausibility of model results as well as empirical evidence to compare the models.4,22,28,34,42
Besides those, many M&S researchers can readily relate to the frustrating scenario in which a new team member (whether an incoming PhD student or even one’s future self) discovers that earlier simulation studies are not reproducible. Hence, R&R in simulation arguably matters first and foremost within research teams that seek continuity in their own work. In addition, R&R studies are not only just a way to advance science but also a pedagogical tool, as they offer rich and complementary learning experiences for doctoral education. 43
3.3. Model reuse and interoperability
In addition, reproducing and/or replicating a computational model can be a good way to have a first assessment of the reusability of reported simulation models.30,36 Reusability generally refers to the degree to which research artefacts such as code, data, models, and documentation can be used again in different studies, contexts, or applications.44–46 It extends beyond reproducibility to emphasise the sustainability and adaptability of scientific outputs. Reusability is a dimension in the FAIR principles (Findable, Accessible, Interoperable, Reusable) of OS. 45 A reusable simulation model or data set is well-documented, openly accessible, and sufficiently modular to be integrated, modified, or extended for new research purposes.
Simulation studies often have two main types of audiences: methodological researchers and applied researchers. 47 The former reads a study to gain an overview of a method’s uses, limitations, and potential improvements. The latter reviews a study with the main aim of applying the method or result to their own research problem. The reuse of simulation models can benefit both types of researchers.
The knowledge and data embodied by the simulation model are available to be utilised by model reusers as a tool to advance their own research agenda. 34 Model reuse also enables testing the broader parameter space of existing simulations. 24 It can further facilitate multi-modelling and hybrid simulation, that is, combining different models and modelling paradigms for the application of complex systems analysis.48–50 That said, reuse of research components and results, a concept closely related to R&R and OS, also requires a careful procedural approach, in the sense that it shall be treated with caution, as good practices can only emerge from diverse, context-dependent interpretations and responsible manners of implementation. 41
An important factor enabling effective model reuse and collaboration across different research contexts is interoperability, which ensures that simulation models, data, and software can be integrated, exchanged, and executed across diverse platforms and tools. Interoperability is a critical enabler of R&R in simulation. The ability of different simulation platforms, software tools, and data formats to communicate seamlessly allows researchers to reproduce results across heterogeneous systems. Standardised data formats, modular software architectures, and open APIs facilitate the integration of components from diverse sources, reducing the risk of errors and inconsistencies. By promoting interoperability, researchers can more easily share models, validate results, and extend previous work, thereby accelerating scientific progress and enhancing transparency.
4. Evaluation of R&R in simulation
For M&S, it is particularly beneficial to evaluate the R&R of a simulation model from two aspects of simulation: the method (i.e. the computational procedure of the simulation) and the simulated results. This distinction is inline with the “method reproducibility” and “result reproducibility” discussed in Goodman et al. 4 The former means that the computer simulation is methodologically reproducible or replicable in theory and practice; the latter means that the simulated results can be quantitatively reproduced or replicated using the same computational method. Clearly, method R&R proceeds (and is necessary for) result R&R.
4.1. Method R&R
When the original simulation model (i.e. code and input data) is accessible to independent researchers, the method R&R can be directly evaluated. This typically means that independent researchers follow the simulation’s computational procedure, as closely as possible – in the case of reproduction, using the same code, tools, and data – based on the workflows described in the original documentation and publication. If the original model is not accessible, the simulation has to be replicated, that is, assessed by newly implementing the simulation model based on the original publication and the model conceptualisation discussed within. Replicating a simulation, of course, can also be done independently when the original code and data are accessible. The actions needed to evaluate method R&R may appear straightforward; however, there are many associated challenges, which are briefly discussed later in the paper.
4.2. Result R&R
After method success, if the reproduced or replicated simulation generates outputs sufficiently similar to those of the original model, the reproduction or replication as a whole can be considered successful. 19 The result R&R, that is, the quantitative measure of the similarity of results, however, is neither straightforward. 51 For example, Muradchanian et al. 52 reported difficulties in comparing multiple Frequentist and Bayesian measures because there is no established standard for the types of metrics used. Another complicating factor in the comparison is the different levels of publication bias. 52 In relation to that, a broad categorisation of result R&R is the so-called “standards of equivalence” or “replication standards.” This is referenced in several studies17,19,22 and first appeared in the replication work of Axtell et al. 42 The three general categories of model equivalence (from strict to loose) are numerical, distributional, and relational equivalences, and they are summarised as follows. 42
Numerical equivalence (or identity) refers to the generation of exact reported results. It typically is not expected for stochastic simulations unless information on random seeds is specified.
Distributional equivalence is determined by showing that two studies produce distributions of results that cannot be distinguished statistically. This is determined by a statistical test of the null hypotheses.
Relational equivalence means that two models can be shown to have the same internal relationship among their results (i.e. inputs, parameters, and outputs). For example, two models show that a particular output variable is a quadratic function of time, or that a measure on a population decreases monotonically with the population size. This is the least demanding comparison, but for some theoretical purposes, it may suffice.
The separation of method and result R&R is particularly useful for simulation studies because R&R studies of complex models can be divided into two stages that are more manageable. The simulation method itself can be examined and tested or replicated first for methodological soundness, which is an important contribution of a simulation study. This stage also verifies the alignment of the conceptual model with the computational model and the experimental scenarios (and conditions). It tests, with the stated computational workflows and steps, whether the computational method can be executed as intended. In the second stage, the results are compared, which is typically performed in a traditional (non-simulated) replication study. For a simulation study of stochastic models, the numerical equivalence is expanded to distributional and relational equivalences, which are often more realistic and reasonable expectations depending on the particular goal of the individual simulation study.
5. Recent development and types of R&R studies
As research artefacts, simulation models should be R&R and potentially reusable by the broader scientific community. Establishing such properties not only enhances the credibility of individual studies but also supports the cumulative advancement of computational knowledge, including that of methodological adaptation and refinement. To this end, encouraging researchers to conduct R&R studies and to report both the successes and challenges plays a critical role in strengthening the evidence base, making improvements, and fostering transparency in simulation research. 22 Recently, there has been a marked increase in scholarly attention devoted to R&R across computational and simulation-based research. The alarming “reproducibility crisis” identified in several disciplines has motivated the development of domain-specific standards, methodological frameworks, and reproducible workflows.3,53–55
The simulation community has conducted more studies now explicitly addressing the need for structured reporting and model sharing. For example, the RepliSims project 28 conducted replications of eight highly cited simulation studies and highlighted the key enablers of replicable statistical simulation. Monks et al. 56 introduced the STRESS documentation guidelines to strengthen the reporting of experimentation and results. Grimm et al. 57 updated the ODD (overview, design concepts, and details) protocol to promote consistent description of ABMs and enhance their reproducibility and comparability.
Notably, the computational biology and in silico medicine communities have been active in tackling R&R challenges.54,58,59 Relevant works include, for example, MIRIAM (minimum information in the annotation of models);60,61 MIASE (minimum information about a simulation experiment), minimum information about simulation experiments; 62 SED-ML (simulation experiment description markup language), 63 COMBINE standards and formats, 64 as well as projects in VPH (virtual physiological human) 65 and in silico oncology. 66 Moreover, Knapp et al. 67 provided 10 simple rules for successful cross-disciplinary collaborations in computational biology. Ziemann et al. 68 introduced five pillars of computational reproducibility, which emphasised practices such as literate programming, code version control, and persistent data sharing to enhance reproducibility in bioinformatics and beyond.
Initiatives, such as Zhu et al.’s 69 integrated framework for assessing computational reproducibility, further elaborate on the importance of transparent workflows and code-data coupling, and the need for more holistic approaches to reproducible simulation-based science. Collectively, the increased efforts in R&R reflect a growing consensus that advancing reproducibility requires not only conceptual agreement but also concrete methods, workflows, and infrastructure tailored to the specific characteristics of simulation research. Within the growing body of work, three main types of research can be identified as shown in Table 1.
Three types of reproducibility and replicability (R&R) studies in simulation.
The first type, empirical evaluation of R&R, comprises studies that computationally reproduce or replicate published simulation models and associated results.28,70 Such a study typically evaluates one or a few published simulation-based studies, either by using original or new model code and/or data closely following the original experimental design and workflow. These studies are typically expensive to up-scale due to the time and effort required.
The second type, assessment of R&R readiness, refers to works that examine the published simulation by looking at indicators such as the openness of original model code and data, the clarity of workflows, the quality and coverage of original reporting and documentation, and whether the original publication was often cited or the model was reused by other publications, etc.18,21 This type of work, unlike Type I, does not directly evaluate R&R by executing the models and data. The assessment by indicators, however, can often cover a greater number of model publications, and the results are insightful to reveal the R&R potential of the original studies. It is also useful for scoping and prioritisation, enabling researchers to screen a large body of simulation-based studies and identify those that are promising for more detailed R&R investigations.
The third type of work encompasses diverse original research on simulation R&R or reviews of those works. They may be based on works of the two previous types. But instead of focusing on computationally reproducing or replicating individual original simulations or assessing their R&R readiness, the third type of work is aimed at generally improving the R&R of simulation practice.56,69,71,72 They can be intended for, for example, theory building, development of methods, guidelines, tools, as well as social and (research) culture studies, community engagement, policy analysis, and so on, to operationally and structurally improve R&R practices.
Even with recent developments and achievements, all three types of R&R studies are significantly lacking given the high number of computational simulation studies published in the literature and the increased importance of R&R of computational science in research. Such studies form the evidence base of R&R simulation-based research, provide theoretical and methodological foundations for more R&R practices, and have the potential to motivate and engage more researchers in intellectual exchange and debates on this important topic.
6. Challenges in reproducing or replicating a simulation study
Different domains and disciplines are often of a distinct nature, thus resulting in models of a different nature. They can be, for example, with different levels of abstraction and details, and various stochastic characterisations representing uncertainty. 17 Unlike in many physical sciences and engineering domains, systems that have less clear or agreed-upon “ground truth,” such as social systems or value systems, have many degrees of freedom and levels of uncertainty in model conceptualisation. In addition, technological advances and social development form a “seamless web,” via which the connected parts substantially shape and reshape one another. 73 This system perspective shapes how we develop models. See the models in the work by Shrestha et al., 74 used for hardware development in the automotive industry and health care, versus the models in the work by Balkan et al. 50 for estimating infection risk, in the work by Wagenaar et al. 75 for neonatal care interventions, or in the work by Sundaram et al. 76 for incorporating justice considerations in energy transitions. The conceptualisation of the latter types of models is often subject to disparate or even inconsistent interpretations. This often means that simulation R&R studies of such models by independent researchers are particularly hard, even when the original code and data are available.
Many reported that communications (sometimes extensive personal interactions) between the researchers and the original author(s) of the work are helpful for simulation R&R studies.17,22,34 However, when such communications are not possible, for example, when the original researchers have left the field of work or are not reachable, a R&R study needs to rely on the original published materials. In such cases, typical challenges reported in literature concern two categories of source materials: (1) the reporting of the original simulation studies and (2) the documentation, corresponding models, and associated data (if they are available). The challenges, briefly summarised in the following paragraphs, impede the process and the comparison of results when empirically reproducing or replicating a simulation study.20,22,33,77 We do not include contributing factors such as increased project complexity, publication biases, lack of incentives and funds, and legal and regulatory issues. These are systematic or structural and can be found in the works by Alston and Rick, 2 Baker, 6 Begley et al., 78 and Munafo et al. 79 We do not aim to give a detailed account of the challenges but to outline noteworthy ones in the literature.
Many reproducibility challenges are caused by incomplete or ambiguous reporting and documentation of simulation studies.17,22,56,80 It is not uncommon that the original work does not contain enough information regarding the conceptual model, computational model, and/or experimental conditions. Studying and understanding the conceptual model is often the most significant step 22 where ambiguity in communicating a model and its experimental conditions can result in varying interpretations, including that of assumptions, mathematical processes, and mechanisms.17,22 When simulation models include stochastic processes, the experimental conditions, including the seeds, are often not explicitly mentioned or lack sufficient details, which makes numerical or distributional equivalence difficult to achieve, and the method’s strengths and limitations poorly understood.22,81 Sometimes, the referred documents or online resources in the original publication cannot be retrieved.
The availability of the original study’s source code and associated data is helpful for R&R.22,80 However, openness alone is often insufficient for ensuring reproducibility. 30 Many challenges can arise from the available model software and data, as well as the hardware, regarding their executability and comprehensibility.3,82 Software includes, for example, the required simulation environments (a.k.a. platforms), code libraries or toolkits, and the programming languages used for model implementation, as well as those for data management, processing, and analysis. The choices often have a strong influence on how models can be represented and interact, and they may yield different outcomes. Cross-platform and cross-language replications, as well as the portability and consistency of different algorithms, workflows, and their performance constraints, can be sources of significant variability between model results.3,24,30
Sometimes, the descriptions of published reporting and documentation were explicit but had incorrect or inconsistent source code or data in relation to the descriptions.17,22 They can be caused by the translation from conceptual model to model implementation (or vice versa), or by the alignment of the report to the conceptual model or implementation. 17 This may also be due to versioning issues, such that the reporting was updated (e.g. during the reviewing process), but the code base or documentation (partially) was not.
Despite the ongoing progress towards more transparent reporting and documentation, significant challenges remain in achieving R&R in simulation research. Overcoming these challenges will require time and effort to adhere to reporting standards, as well as (research) culture, methodological, and infrastructural commitments to advance a truly R&R simulation science.
7. Characteristics of R&R simulation
Reproducing and replicating simulation studies form the methodological foundation for ensuring the credibility of simulation results. Although they share overlapping goals, each emphasises different aspects of transparency, verification, and validation in research practice.
On one hand, reproducible simulation research requires comprehensive documentation such that it enables an exact reconstruction of workflows. Achieving reproducibility depends on open and sustained access to all components of the computational process, including source code, data sets, software dependencies, and computational environments. 81 Key reproducibility features include the following:
Source code availability: All code used for simulations (including scripts, algorithms, and randomness control) should be archived and available in stable repositories, for example through public version-controlled platforms such as GitHub, GitLab, or institutional repositories with persistent identifiers.
Data set availability: All data used for simulations should be available in stable repositories. Details about the metadata, variable definitions, distributions, and sampling procedures should also be explained to ensure the complete specification of the data set, for instance using standardised data documentation formats or data descriptors.
Environment set-up description: Detailed explanation of the development and test environment should be provided, including software versions, hardware configurations, dependencies, and parameter settings. This can be facilitated by environment management tools such as Docker, Conda, Pip, or virtual machines, which allow the computational environment to be explicitly specified and reconstructed.
Workflow specification: Each analytical step, from pre-processing to experimental steps, and post-processing, should be clearly explained and traceable either through literate programming tools (e.g. Jupyter Notebooks) or workflow manager tools.
On the contrary, replicable simulation research extends validation beyond the original implementation and data set. It tests the generalisability of the approach, that is, whether a result holds true under different implementation contexts, such as alternative programming languages, platforms, or modelling frameworks. Key characteristics for replicability include the following:
Experimental design: Detailed explanation of the model and experiments should be provided, including the model logic, equations, assumptions, and conceptual structure. This may be supported by formal model descriptions (e.g. the ODD protocol for ABMs), diagrams, or mathematical specifications that allow independent reimplementation.
Data set information: Metadata and variable definitions should be clearly documented. In addition, it is advisable to deposit the data in stable repositories to allow future teams to access and reuse the same data sets, or to generate equivalent data sets when replication relies on synthetic or simulated data.
Specification of stochastic processes: Details about random-number seeds, distributions, and sampling procedures should be explained so that the structure of uncertainty is understood and can be independently reproduced using alternative implementations.
Limitations and constraints: The domain context, limitations, and constraints should be explained to indicate where replication may fail due to computational or conceptual constraints, for instance when results depend on specific hardware architectures, restricted software, or unavailable real-world data.
Workflow transparency: The overall workflow and analytical steps should be clearly described to ensure that others can understand, evaluate, and replicate the study, for example through high-level workflow diagrams or platform-independent protocols.
Table 2 summarises the essential reporting and documentation categories and features for R&R in simulation.
Reporting and documentation categories and features for reproducibility and replicability.
The study design of reproducible simulation shall be accurately described so that an independent researcher can understand the hypothesis, overall goal, variables, and setup and rerun the analysis using the same code and input data. For replicable research, researchers shall go beyond describing the original study setup and specify the conditions under which a new team could independently recreate the study with potentially new data. The study design should, therefore, describe the population, methodology, data collection protocols, and criteria for generating equivalent or similar data sets.
The workflow of reproducible simulation should be accurately documented to enable exact repetition, with all code, scripts, and software versions fully disclosed. For replicable simulation, the workflow must outline the procedures for analysis, but the emphasis shifts towards generality, ensuring that a new implementation (with possibly different data or context) following the described protocol can address the same scientific question.
In addition, for both R&R, the research integrity and consistency between the reporting, documentation, code and data are essential. Clear and well-defined conceptual models, transparent methodological descriptions, appropriate use of theoretical frameworks, and thorough validation of results should be ensured to maintain rigour and reproducibility.
8. Emerging opportunities for R&R
Emerging opportunities are transforming R&R in simulation-based research, offering pathways to greater transparency, quality, and community impact. To fully realise these benefits amid increasing model complexity and diversity, R&R must be systematically embedded into simulation workflows and supported by stronger, collaborative community efforts.
8.1. Open science and community-driven practices
The advances in OS practices, such as the use of Free and Open Source Software (FOSS), open code repositories, and data sharing platforms, have lowered barriers to sharing and reusing simulation models. 83 Public repositories for code, models, and simulation data sets (e.g. GitHub, Zenodo and Open Science Framework) enable transparent access to the full modelling pipeline. Initiatives now promote not only the publication of model code alongside articles but also the packaging of workflows into user-friendly web applications, improving accessibility for researchers and practitioners beyond technical specialists. Containerisation and environment specification tools, such as Docker and Conda, enable the recreation of computational environments, mitigating issues arising from software version incompatibilities. Community repositories and FAIR data principles are also increasingly standard, helping ensure that code, data, and documentation remain available for assessment and reuse. 45
Another key trend is the co-development of community-driven standards for documenting and sharing simulation studies. 84 This includes formal reporting guidelines for simulation design, experimental protocols, parameter definitions, and software environments. 46 Community platforms (e.g. streamlit-based apps) facilitate collective input and code reviews, encouraging reusable and robust model development. Tools that enable standardised archiving, provenance, persistent identifiers, and reproducibility checklists help embed good practices across the research lifecycle.
8.2. Utilising AI and software engineering best practices
Given the complexity and diversity of simulation models, R&R should not be an afterthought. Instead, workflows should be built for transparency from the onset using literate programming, containerised environments, version control, and modular code organisation. Research teams should adopt clear protocols for documenting model provenance, tracking changes, and linking code, data, and outputs.
As an emerging technology, artificial intelligence (AI) presents practical opportunities to enhance R&R in simulation. Frameworks for embedding R&R in simulation can employ AI to support or automate (or semi-automate) known best practices at each modelling stage. For example, AI-driven tools can assist in tasks such as workflow documentation, standardisation of naming conventions, and generation of metadata for models and outputs. 85 AI can also assist in benchmarking and testing reproducibility, generating synthetic data for replication, and supporting intelligent search and annotation of simulation resources. 86
Generative AI models can further assist in automating complex procedures or searching the literature for best practices and emerging standards and can be regarded as a specific and increasingly influential subclass within the broader landscape of machine learning (ML) and AI. Adaptive and generative AI methods combine self-optimising model behaviour with the ability to synthesise new data, models, or workflows. In simulation, adaptive techniques can continuously calibrate models based on feedback from simulation outputs, improving accuracy and robustness over time.
In the literature, recent studies have begun to explore specific uses of generative AI in simulation contexts.85,87,88 For example, Alshareef et al. 89 demonstrate how generative AI can assist in transforming activity and flow-based diagrams into executable simulation models. Ghaffarzadegan et al. 90 introduce generative ABM, integrating large language models with agent-based simulations to enhance model construction. In addition, Jackson et al. 88 propose a framework that uses generative AI and natural language processing to automatically generate simulation models of logistics systems from verbal descriptions. These studies collectively illustrate the potential of AI to support and accelerate simulation model development.
However, the outputs of AI, especially generative models, are inherently unpredictable and may contain errors, inconsistencies, or biases. It is, therefore, imperative that researchers independently verify any AI-generated content or analyses before integrating them into simulation studies. Hence, the limits of AI with regard to R&R should be explicitly recognised, as AI can assist and augment reproducible workflows but cannot replace careful methodological design, documentation, or independent validation. ML and AI can also aid in parameter sensitivity analysis, uncertainty quantification, and anomaly detection, ensuring that model outputs are both robust and reproducible.
In addition, embedding model-driven engineering (MDE) into simulation modelling, especially in conjunction with AI, represents a major opportunity to systematise R&R within complex computational workflows. While MDE is not new, its integration with AI-assisted automation and OS frameworks can now redefine how simulation studies can be better designed and performed, and provide knowledge-based constraints on and verification for AI-based results. By defining explicit models and transformations, MDE facilitates traceability from conceptual designs to executable simulations, and its systematic application remains highly relevant for R&R.91,92 Figure 1 illustrates an AI-assisted simulation modelling workflow that embeds MDE and OS.

An AI-assisted simulation modelling workflow.
Integrating MDE within simulation workflows ensures that models are modular, parameterised, and version-controlled, enabling consistent replication across different computational environments. Moreover, MDE complements AI-assisted practices by providing well-defined model artefacts that AI tools can analyse, validate, or optimise. For example, AI algorithms can automatically check consistency between model layers, suggest optimisations, or detect discrepancies in model transformations, while MDE guarantees a structured representation of simulation logic and dependencies.
8.3. Quantum computing for simulation and the way forward
Quantum computing is another emerging topic offering a novel computational paradigm that has the potential to significantly expand the scale, complexity, and fidelity of simulation-based research, creating new opportunities for R&R. Quantum computing utilises the principles of quantum mechanics, including superposition and entanglement, to perform computations that would be infeasible for classical computers. 93
Unlike classical bits, which represent either 0 or 1, quantum bits (qubits) can encode multiple states simultaneously, enabling parallel exploration of large computational spaces.94–96 This capability makes quantum computing particularly promising for complex simulations in areas such as optimisation, molecular modelling, and stochastic processes, where traditional approaches face scalability limitations. This increased computational capacity enables researchers to explore larger parameter spaces and perform more exhaustive sensitivity analyses, which enhances the robustness and reproducibility of simulation outcomes.
The relationship between quantum computing and simulation is bidirectional. Quantum computing introduces new computational principles that may substantially extend the scope and efficiency of simulation methods, while simulation remains essential for the design, evaluation, and validation of quantum algorithms, architectures, and hardware systems.
However, while quantum computing offers opportunities for advancing simulation capabilities, it also introduces novel challenges to R&R. Quantum-enhanced simulations are inherently sensitive to hardware variability, noise, and stochastic quantum effects, making exact reproduction of results across different platforms difficult. Moreover, the current lack of standardised quantum software frameworks, coupled with limited access to high-fidelity quantum hardware, further complicates verification and replication efforts. Despite these challenges, by carefully designing hybrid quantum-classical workflows, employing modular and well-documented quantum subroutines, and integrating classical verification methods, researchers can mitigate risks and foster replicable and transparent simulation studies in this rapidly evolving computational paradigm.
A significant area for future research lies at the intersection of quantum computing and agentic AI.97,98 This emerging research space introduces both conceptual and methodological complexities, particularly regarding the integration of quantum computational elements, such as superposition and entanglement, into simulation-based learning and decision-making frameworks. 99 Beyond purely technical integration, this line of inquiry also raises broader questions about model design, interpretability, and evaluation, as quantum-enhanced components may fundamentally alter the behaviour and outcomes of simulated agents.
Future research may, therefore, focus on the development of dedicated modelling and assessment frameworks to systematically investigate these hybrid systems. In addition, establishing reproducible and replicable protocols for such quantum–agentic systems will be essential to ensure that results can be independently verified, that integration strategies are robust across platforms, and that the benefits of these emerging technologies can be reliably evaluated and adopted in both research and applied contexts.
8.4. Practical implications and real-world applications
Emerging opportunities in R&R are particularly relevant to industry-focused applications of simulation-based research. In sectors such as aerospace, manufacturing, and health care, the ability to reliably reproduce simulation outcomes is vital for design verification, regulatory compliance, and risk assessment. By adopting reproducible workflows, organisations can reduce development costs, streamline testing procedures, and enhance confidence in simulation-driven decision-making. In addition, industry adoption of reproducible practices encourages collaboration with academia, facilitating knowledge transfer and the co-development of robust simulation models. Figure 2 lists some of the potential technologies that could advance the simulation field and specifically the R&R in simulation.

Emerging opportunities and technologies for reproducibility and replicability in simulation.
Once implementable methods and supportive tools are more widely available, the benefits of R&R in computer simulation are expected to extend across multiple domains. Enhanced R&R will not only facilitate knowledge transfer between methodological and applied researchers but also accelerate robust and reliable innovation in fields such as health care, manufacturing, transportation, energy systems, and environmental modelling. In particular, the integration of digital twin technologies offers opportunities for continuous validation and real-time simulation, bridging the gap between virtual models and real-world systems. There are many pilot projects as well as mature examples, from personalised medicine 100 to space missions such as digital twins of the James Webb Space Telescope 101 and Mars missions, 102 and those used for high-energy physics experiments 103 and Earth systems. 104 Given their expected impact on society, the role of R&R in them is pivotal and self-evident. 105
9. Discussion and reflection
Simulation is a powerful method of inquiry enabled by computing. Unlike field experiments or laboratory investigations, simulation models generate results relying on computational routines. Thus, their computational R&R is a basis for meaningful analysis, corroboration, and further use of the results. Scientists and grant agencies have spent a large amount of time and funds on projects that develop new simulation models – while these efforts have been indispensable and fruitful, they often do not explicitly address computational R&R, despite the fact that many regard it as critical for scientific simulation.22,28,106
At the same time, simulation models are becoming increasingly complex and more widely used. Recent developments in M&S also included more functionalities that use AI, particularly ML. The domains of application of simulation have expanded beyond the physical sciences and engineering to include the social and behavioural sciences, among others. Those who use and develop scientific simulation models are well-trained in their respective domains, but not necessarily in the software aspects of computational methods. All of these make simulation R&R a more complex task to pursue and carry out. As a community of model users and developers, we need to recognise that R&R is extremely challenging and time-consuming in practice, and that the steps needed to tackle it are not going to be acceptable to everyone.36,38,106 Nonetheless, we need and have to talk (more often) about R&R as a community. 53 In particular, when simulation is used for decision support with real-world impact, for example, in safety-critical applications, the reliability of the computational results must be investigated. 17
The recent growing emphasis on R&R has become particularly visible within the simulation community, where researchers are increasingly formalising methods and infrastructures to ensure transparent, verifiable, and shareable computational studies. Over the past decade, simulation-oriented conferences and journals have begun to recognise R&R not merely as an ethical imperative but as a technical and scholarly contribution in its own right. An example is the ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (PADS), which has established an approach to reporting the reproducibility of simulation papers. These reports assess whether the original results can be independently reproduced using the authors’ shared artefacts, code, and documentation. For instance, Rossi and Vabdin 107 reported on the reproducibility of the experiments by Piccione and Pellegrini. 108 Similarly, the journal ACM Transactions on Modeling and Computer Simulation (TOMACS) is aligned with ACM’s Task Force on Data, Software, and Reproducibility and supports artefact review and badging policies. 109 These are signs of a cultural shift towards valuing R&R as a measurable research output. R&R should not be viewed solely as an ancillary best practice but as an integral component of scientific progress, enabling simulation studies to be independently verified, extended, and trusted across diverse application domains.
The OS initiatives taking place in many parts of the world have brought about positive changes, making science more transparent and accessible. However, for those who wish to engage in the R&R of simulation models, openness alone is insufficient. There is still a significant gap to overcome towards R&R and the potential reuse of simulation models. With the R&R research movement, many authors have also advocated for more structural and cultural change in institutions and research communities, for example, good institutional practice, appropriate incentive and evaluation systems, funding policies, journal guidelines and standards, training programmes, among others.36,78,106,110
Besides those, another crucial question we need to ask ourselves, which is not yet sufficiently addressed, is how to make R&R more attractive and operational, for both model builders (i.e. original researchers) and model reproducers or replicators (i.e. independent researchers)? They face related but distinct challenges. Therefore, they require different types of skills and support. For both, how can the “extra” time and effort be made more worthwhile and implementable, for researchers who primarily focus on original research and already have high workloads? There are no simple solutions. We discuss four reflexive thoughts that could fit well with the efforts in simulation modelling.
The first is to tie R&R in M&S more closely to model reuse. Model reusability is challenging in itself, while R&R is a promising first step towards model reuse. This can serve as an incentive for original and independent researchers, since reusing a computational model is a recognition of the original work, which also eases model reusers in their own research. It would be useful for researchers to have infrastructures and resources, such as registers of reproducible or to-be-reproduced simulation models, to promote domain-specific corroboration and potential model reuse. Currently, there is no lack of online code and model repositories and versioning systems, but how R&R is the simulation model indexed therein is highly unclear, and they generally lack peer evaluation. In this regard, more publication venues can consider publishing high-quality simulation R&R studies, as well as requesting and supporting reviews and evaluations of simulation artefacts.
The second is to develop processes, methods, and supporting elements, such as benchmarks and tooling, that can integrate well with existing simulation model development workflows and practices. While R&R is a socioeconomic problem, 32 it also has many methodological issues to overcome. Existing works of R&R in different disciplines often have their particular focus. For example, the ACM Conference on Reproducibility and Replicability (ACM REP, inaugurated in 2023) places a strong emphasis on computational issues (https://acm-rep.github.io). ReScience C, 111 an open-access journal, is dedicated to the publication of high-quality replications implemented with FOSS (https://rescience.github.io). While these initiatives and venues are beneficial to the M&S community, a simulation model is a special piece of software that focuses on the imitation of dynamic systems that change over time. We argue that general good practices often stem from software engineering, and they are helpful but often not sufficient for R&R simulation. For simulation, experimental workflows and steps, data trajectory of state changes, and results of different experiments, etc., should be captured and managed in an incremental M&S-process-based research cycle. 30 In addition, being able to identify the type and level of complexity of simulation experiments is a significant help in estimating the time and effort needed for R&R studies. Such methods and tools are generally not available but urgently needed in the field. Research topics in this regard include experiment (and scenario) management, documentation, complexity evaluation, etc. The M&S community could organise regular R&R-specific sessions and workshops in conjunction with existing initiatives or at M&S-focused venues to engage more researchers and foster more discussion and collaboration within the community.
Moreover, because model conceptualisation entails simplification and abstraction based on many assumptions, given certain goals of a simulation study, modellers shall be facilitated in capturing such conceptual conditions in a more clarifying and methodological way. 48 There are efforts and recommendations towards this direction, for example, metadata structures, model description languages, and reporting guidelines.3,19,30,56,57,82 We also need a better understanding of researchers’ and practitioners’ needs and workflows, which are often highly contextual, to design (or avoid over-designing) methods and tools that can facilitate the wider adoption of more formal descriptions of conceptual models and assumptions, thereby enhancing model R&R.
Last but not least, R&R simulation entails a continual process and a cultural change towards more sustainable simulation research. Hence, it shall not be treated as an afterthought of a scientific study or as an end in itself.2,4,30 It calls for a way of working and thinking about how we develop and maintain our scientific simulations. Some might think that we need more openness, transparency, better methods and good practices to make simulation more R&R. While these elements are indeed essential, we argue that to effectively implement, adapt, and expand existing practices for more R&R simulation, we first need incentive building and engaging more researchers. How to effectively stimulate and sustain community engagement for R&R simulation is still largely unclear. Nonetheless, a good starting point is to explicitly consider and activate the important role of research institutions and graduate programmes, for example, in recognition and rewards, adapting research assessment criteria, and integrating R&R and OS elements in education and training.43,110 Advancing R&R in computer simulation demands not only guidelines, methods, and tools, but also structural, social, and (research) cultural investigations of simulation practices.
10. Conclusion
R&R in simulation face significant challenges. Yet, emerging opportunities, such as community-driven practices, AI, and quantum computing, offer promising paths for advancement. By ensuring that simulation results can be independently reproduced and replicated, researchers can mitigate uncertainties in the method and enhance the credibility of simulation-based findings. General principles and guidelines, such as transparency, version controls, containerised environments, and adherence to community standards, remain essential for supporting these efforts. Simulation-specific methods and tooling, as well as ongoing community engagement, are also indispensable for improving R&R in simulation.
In addition, reviews of simulation studies will be insightful with more detailed R&R evaluation and assessment, to enable a clear mapping of current R&R evaluation and assessment practices, and identify the effectiveness of evaluation and assessment criteria and methods. Such efforts would also support the development of more robust methods to produce R&R simulation-based research. The work of Luijken et al. 28 and Heather et al. 70 presented such investigations of R&R. There remains a clear need to broaden the scope of such reviews and expand into other domains. To this regard, the development of more automated R&R evaluation and assessment methods and tools can help the simulation community implement such studies and achieve more scalable results.
The convergence of conceptual understanding, methodological innovation, supportive tools, and standardised practices aimed at enhancing R&R holds great promise for simulation-based research. By addressing R&R in a systematic and scalable manner, the simulation community can strengthen scientific rigour, foster collaboration and peer learning, and generate more credible insights into increasingly complex systems.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
