Abstract

Keywords
1. Introduction
Over the past two decades, many changes have taken place all over the world. In Europe, the refugee crisis, in particular, has induced some considerable demographic changes. During the pandemic, epidemiological questions played a dominant role including their impact on the societies. And a little later, the war in Ukraine again had some impact on population changes, though in Germany with high regional heterogeneities. However, additional impact could be observed in the availability of energy that forced a strong rethinking of energy use. Recently, climate change and its impact on society have been discussed more and more, with very dry periods on the one hand and flooding on the other. All these societal developments require research on the impact of these changes on the well-being of the society.
Microsimulations provide a powerful tool for studying the impact of socio-economic and demographic developments in connection with policies on the society using micro data such as persons, households, companies, or others. The aim is to evaluate, in-depth, the policy changes and their impact on the different units under a variety of scenarios. A typical example is the study of the effects of tax systems on the equality of the societal units. Nowadays, however, there are numerous application areas for microsimulations, such as socio-economic studies, urban planning, transportation, epidemiology, and others.
Microsimulations were originally introduced by Orcutt (1957) to overcome the drawbacks of using macro-models, such as unobserved heterogeneities or non-linear interactions. He aimed at using microunits such as individuals, households, or companies directly rather than their aggregates. The technique is designed to model complex real-life events while simulating actions or the impact of policy changes on the individual units where events occur (see Harding et al. 2010). Li and O’Donoghue (2013) describe microsimulation as a tool to generate synthetic micro-unit based data, which can then be used to answer many “what-if” questions that, otherwise, cannot be answered. This immediately points to two different views on microsimulations: on the one hand the investigation of policy changes in terms of performing the microsimulations per se and on the other hand the data generation in terms of improving and enhancing existing datasets. The latter stems from the fact that, survey data generally suffer from the lack of some variables of interest, detailed geographical coverage, and other sources of errors. However, other data sources such as administrative data or even big data, if available, may have other drawbacks that are to be considered when evaluating the quality of microsimulations like containing even fewer variables than survey data. Hence, the quality of the database used for microsimulations, the so-called base population, is one of the main assets for good microsimulation studies. Nevertheless, latest research has increasingly focused on the integration of modern data, such as remote sensing data, mobile data, or sensor data in order to enhance the synthetic population.
Microsimulations per se can be differentiated by static versus dynamic modeling. In contrast to static models, in dynamic models the base population changes over time. There are many methodological differences in how a dynamic simulation can be implemented. For a thorough overview of the different types of microsimulation methods, we refer to Merz (1991), Spielauer (2010), Li and O’Donoghue (2013), Hannappel and Troitzsch (2015), O’Donoghue and Dekkers (2018), Burgard et al. (2020), Münnich et al. (2021), or Schmaus (2023), and the references therein.
Reviewing the development of microsimulations, in the past, the computational requirements and the availability of data gave the frame of applications. Many applications were driven in research or national statistical institutes due to the availability of microdata. The rise in computing power and the improved accessibility of microdata have contributed to the global spread and further development of microsimulations.
The following two sections describe modern dynamic microsimulations, and especially the importance of providing adequate large datasets to furnish open data and reproducible research.
2. Regional Dynamic Microsimulations
Dynamic microsimulations consider the modeling of changes in a population over time. Certainly, this urges the availability of a cross-sectional dataset at the necessary level of detail. In most models, this takes place at discrete time intervals, usually annually (so-called discrete-time dynamic models). The idea is to model changes in the state-space using an adequate statistical autoregressive model predicting the new characteristics of all variables given the past. In a large dataset, however, it is unlikely that one model can be estimated that covers all changes of variables at once. Hence, the entire distribution often is split multiplicatively into recursive conditional blocks that represent certain modules content wise. Figure 1 provides an overview of the basic modules of the MikroSim model, one of the largest models based on the German population of over 82 million people in the starting year 2011 (see Münnich et al. 2021, or Weymeirsch et al. 2024).

Sequence of the MikroSim modules (see Weymeirsch et al. 2024, 4).
In general, demographic variables are simulated first since available information is often more accurate than for other variables. Further, the sequence of the modules has to be considered by the conditioning of the prediction models on the previous events or other available variables. MikroSim per se is a time discrete microsimulation model and other models, in most cases, use a similar structure depending on the content of the microsimulation. Note, that a different approach is taken in continuous models: Instead of simulating state changes from time to time in discrete steps, the time until events occur is simulated. A comprehensive description of continuous models can be found, for example, in Zinn (2011).
Since nowadays policies consider regional aspects such as districts or communities, special attention has to be given not only to the granularity of the dataset in use, but also to the dynamic processes of simulation state changes. Small sample sizes and disclosure control may lead to restrictions, for example, on larger administrative regions—in Germany often districts and above. However, urban planning urges the needs of modeling on very small areas, ideally using geo-codes. For this reason, the last decade has seen an enormous increase in the importance of so-called small area or spatial microsimulation. This includes various methods that focus on the synthetic creation of geographically disaggregated (small-area) data. Recent overviews on regional dynamic microsimulation models can be drawn from Rahman et al. (2010), Tanton (2014), Rahman and Harding (2016), Lovelace and Dumont (2017), and Burgard et al. (2019).
3. The Role of Statistics, Open Data, and Reproducible Research
3.1. Statistical Methods and Their Implications on Microsimulations
The future acceptance of microsimulations will largely depend on two key factors: the quality of the underlying microdata and the ability to accurately reflect real-world dynamics within the simulated dataset. Additionally, uncertainties of the entire data generation and simulation process have to be considered to better understand the outcomes of the microsimulations.
Since hardly any register, administrative, or census data are available that adequately cover all relevant information, available data sets have to be extended to foster an appropriate frame of the population under investigation. In cross-section, in general, survey data are to be considered. The same applies to the implementation of dynamics: information must be available at the individual level and over time, which requires the use of panel data, which also stem from statistical surveys. However, working with survey data requires particular care in analysis. As a result, almost all modern survey statistical and statistical prediction methods are to be considered. This includes the treatment of missing values, the consideration of complex survey designs in analysis of statistical models and uncertainty, as well as small area statistics to integrate auxiliary information in cases where regional granularity is lacking (cf. e.g., Lohr 2022; Meinfelder 2014; Pfeffermann and Sverchkov 2009; Rao and Molina 2015; Särndal et al. 2003).
Both, in the generation of individual variables as well as in the simulation of transitions, statistical modeling methods are typically employed. Alignment methods allow that the multivariate dependency structure at the micro level is preserved while also aligning with known macro-level structures. They may be understood as a means of connecting macro-level benchmarks—such as regional aggregates—with micro-level models or probabilities, and to avoid that fully stochastic predictions quickly run out-of-bounds, yielding implausible populations. One classical example for alignment is the alignment of population developments in microsimulations toward population extrapolation from official statistics to achieve coherence between the model types. An overview of different alignment techniques can be found in Li and O’Donoghue (2014), Stephensen (2016), Burgard et al. (2021), Schmaus (2023), and Weymeirsch et al. (2024), and the references therein. Dieckmann (2025) provides an overview on harmonizing economic generalized equilibrium and microsimulation models. This may play an important role for policy support.
When considering geo-referenced data, often disclosure control disables researchers from using directly using geo-coded data. Hence, best approximations using margins of statistical information or regional level have to be taken into account. One major difficulty in the geo-referencing is the assignment of households, which often can be relatively easy generated on aggregated levels, to houses or dwellings while considering all available statistical information on any higher degree of aggregation. Houses, in general, can be taken from OpenStreetMap and other sources, depending on the country of interest and the information available. Friedrich et al. (2024) provide mathematical optimization techniques to assign households to available living space in houses considering statistical information on different aggregation levels.
A key challenge in the development of dynamic microsimulations is the measurement of uncertainty. Due to the high complexity of the models, there are numerous heterogeneous sources of uncertainty. For this reason, dynamic microsimulations often refrain entirely from quantifying uncertainty. However, there are approaches that allow for the generation of confidence intervals even for complex models. One straight forward but important approach is to repeatedly run the microsimulation using stochastic predictions in the above models. This at least allows measuring uncertainty via classical Monte-Carlo techniques.
In addition, variance-based sensitivity analyses offer a way to directly compare uncertainties stemming from various sources, such as model uncertainty, Monte-Carlo uncertainty, sampling effects of data sources, alternative estimation methods, and others, as well as different scenarios for the microsimulations themselves. Through variance decomposition, the uncertainty of the target value can be attributed to different factors. This method allows for a direct comparison of the before-mentioned uncertainties (Saltelli et al. 2008). Specific application examples can be found in Sharif et al. (2012), Burgard and Schmaus (2019), Burgard et al. (2019), Schmaus (2023), and Dumont et al. (2025). Although uncertainty measurement for microsimulation is still in development, it has to become best practice to provide information on the uncertainty of the simulation outcome with respect to the parameters of all factors in the microsimulation.
3.2. Open Data and Reproducible Research
Open data and reproducible research more and more play an important role in modern science. Activities of the European Union aim at fostering an environment that provides data for comparative research. One attempt was the former research infrastructure Data Without Boundaries (DWB: https://dwbproject.org) which hardly succeeded as a unique data platform for European research. Although many countries are trying to find good solutions, access is generally hampered by disclosure control, especially when talking about reproducibility.
Alternative approaches deal with synthetic data generation (cf. Drechsler et al. 2008, and Drechsler and Haensch 2023, and the references therein). The idea is to mimic a given dataset which can then be provided to researchers. This automatically results in a discussion of the controversy between disclosure risk and utility. In terms of microsimulations, however, the need for larger datasets and more variables requires advanced data generation techniques to create digital twins of large pseudo populations. The term digital twin often is referred to as a digital reproduction of a given set of objects, a system, or processes. In this case, a society is generated by statistical methods with a given set of information on individuals such as households and persons. This population may be enriched by geo-coded information like houses, working places, and other.
Considering the above ideas of a regional dynamic microsimulation, two types of disclosure risks can be identified: the generation of predictive models using input data as well as the simulated outcome data in terms of a digital twin. Brenzel et al. (2024) discuss opportunities to provide such digital twins to the research community while considering disclosure risks during the production of the statistical models. The result is essential for the research community since it provides a framework of highly usable data for microsimulations on the one hand and disclosure control for the original (official) data sources. Further, it should be noted that the stochastic generation of digital twins, that is, producing several datasets similar to resampling, can be seen as an additional source of vagueness while increasing the computational effort which then has to be considered within uncertainty measurement. MikroSim (see https://mikrosim.uni-trier.de) aims at providing microsimulation data center for Germany. The above-mentioned procedures could be expanded to provide comparative research data lab across countries as it was foreseen in the DWB project.
4. Summary and Outlook
The present article provides a short overview of recent developments and challenges in microsimulation modeling. Of course, the microsimulation itself has to be separated from the generation of digital twins. The latter provides unique opportunities for the future like interdisciplinary research covering many fields of interest as well as the integration of several countries.
Nevertheless, there are still many challenges that require further attention. The evaluation of large digital twins covering several topical areas is still new. However, the Covid crisis has shown that, for example, separating epidemiological and economic research is hardly convincing. Further, the harmonization of micro and macro modeling or the alignment using intermediate time developments urges further research.
Statistical properties and evaluation of the combination of (statistical) prediction methods, which can be seen as a set of highly sophisticated combined data integration methods, are still growing topics. As is common in statistics, uncertainty measurement has to accompany microsimulations to better understand the what-if analyses of interest.
Footnotes
Acknowledgements
I would like to thank the entire MikroSim team for their hard work within six years, and especially my team in Trier for all discussions on directions and specific research. Special thanks go to Professor Li-Chun Zhang for the invitation to contribute to this special issue. Congratulations to the Journal of Official Statistics for the 40th anniversary! Thanks to all editors in this time for making the journal to what it is now and good luck for the future!!
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The preceding presentation of microsimulation modeling is based on hard work of the microsimulation research unit FOR 2559 MikroSim, funded by the German research foundation (see
). Certainly, the effort is taken by a large team.
Received: April 22, 2025
Accepted: May 18, 2025
