Abstract
Digital phantoms are virtual representations of the human body used in medical research to test equipment, train medical professionals and develop or validate algorithms. These models can be created from ‘real-world’ clinical data or from ‘synthetic data’. Phantoms derived from clinical data often serves as ‘ground truth’ reference values anchored in empirical observations. However, there is growing demand for synthetic digital phantoms and datasets that do not originate from real patients, raising critical questions about how reliable knowledge is produced from data detached from reality. This article aims to investigate these issues through a document analysis of peer-reviewed publications on the development and use of digital phantoms in medical physics. We examine how researchers construct ‘ground truth’ and the challenges they encounter when advancing truth claims through technical work. By attending to the bodies fabricated in phantom creation and to the data made to represent human form, we show how synthetic data – detached from real human subjects – are valued for enabling researchers to sidestep the complexities or ‘messiness’ of real-world patients and clinical data. Moreover, we show how synthetic phantoms and data are framed as tools that enhance control and flexibility, functioning as ‘known truths’: workable approximations that enables the construction of what are claimed to be more representative datasets and models. This article contributes to Science and Technology Studies and critical data studies by examining the nature and implications of digital representations and synthetic data in the development of machine-learning models in medicine, and the truth claims they support.
Introduction
‘Digital phantoms’ are computerised models or simulations of the human body, or specific organs, widely used in medical physics and related areas such as radiation protection, radiation dosimetry, and radiotherapy (Harrison et al., 2020). They are typically built with the use of two main data sources. One is real (empirical) data, which come from medical imaging (such as computed tomography (CT) or magnetic resonance imaging (MRI)) and capture the geometry and tissue composition of actual bodies or organs. The other type is synthetic data, which computationally model tissues and materials properties, such as mass density, elemental composition, and ionisation potential, that shape radiation transport and energy deposition.
Here, ‘real’ refers to empirically grounded data produced by real-world systems (like imaging devices) and intended to reflect the properties of actual individuals or events. By contrast, synthetic data are generated through mathematical or statistical techniques and are not directly tied to any real individuals or events. Beyond their technical role, digital phantoms are sometimes described as ‘not real’. As Campagnolo et al. (2016: 123) put it, ‘phantoms can be either a computer simulation of a digital model mimicking a real object on which one can test processing algorithms, or specially designed material objects mimicking specific characteristics of a real object’. In this sense, digital phantoms function as surrogate models, replicating human tissue with well-characterised materials (Campagnolo et al., 2016: 124). They become especially valuable when collecting or using real-world data is considered impractical or ethically sensitive. For example, Grob et al. (2019) used a digital phantom to optimise and improve the signal-to-noise ratio for a new CT imaging technique without exposing human subjects to additional ionising radiation, relying instead on a simulated dose.
Medical researchers are increasingly turning to digital phantoms and synthetic data as alternatives to working with real human subjects, physical phantoms, or live clinical settings. Unlike many clinical researchers who can access patient data, those in fields such as medical physics often face constrains – privacy, governance, cost, and logistics – which make synthetic approaches attractive. An article in Nature, ‘Synthetic data could be better than real data…’ (Savage, 2023), captures this shift, noting that barriers to real-world data access are pushing researchers towards synthetic equivalents. However, that enthusiasm comes with a caveat: synthetic data are only useful if researchers strike the right balance between accuracy and ‘fakery’ (Savage, 2023). This tension underpins our interest in how synthetic data and digital phantoms are represented and understood in the medical literature.
Data derived from CT or MRI scans captures the physical geometry and composition of actual human bodies and are grounded in empirical observation. For example, The National Library of Medicine's Visible Human Project created detailed anatomical datasets widely used to test of medical imaging algorithms. Although these virtual representations are technically detached from the physical body, they are anchored in the real world and support valid knowledge claims. They also enable the construction of ‘ground truths’, reference standards derived from actual human anatomy, which are used in development and evaluation of algorithms, models and simulations to improve accuracy and reliability.
By contrast, synthetic data are generated algorithmically, commonly by using generative adversarial networks (GANs) or diffusion models, and do not correspond directly to a real patient or event. GANs, for example ‘produce synthetic outputs that are as proximate as possible to the training data without being an identical mapping’ (Jacobsen, 2023: 2). As Dilemgani (2020) notes, synthetic data are artificially created rather than recorded from the real-world. Andrews (2021) similarly defines them as data ‘that computer simulations or algorithms generate as an alternative to real-world data’. These datasets aim to approximate real data without replacing them (Jacobsen, 2023). They are often promoted as useful where radiation exposure, consent, or availability pose barriers: researchers can optimise imaging pipelines and generate, what they call, ‘accurate’ and ‘realistic’ representations without additional patient risk (Zaidi and Tsui, 2009: 1938). Advocates therefore argue that synthetic data can address ethical concerns and data scarcity while improving overall quality (Nikolenko, 2021: 11–13).
However, working with data that lack a direct connection to real individuals or events raises important epistemological questions about truth and validity. Jacobsen (2023) highlights the challenges of making credible knowledge claims when the data lack an empirical anchor. Unlike resources such as the Visible Human Project, synthetic datasets do not provide a straightforward link between representation and reality, promoting debate over how realism and truth are constructed when research leans heavily on synthetic inputs.
This article addresses these concerns by examining how medical researchers describe and justify the use of digital phantoms and synthetic data. While social scientists have begun to study synthetic data (de Vries, 2020; Jacobsen, 2023; Johnson and Hajisharif, 2024; Steinhoff, 2022), there is little focused analysis of its roles in medicine – especially its use as ground truths or its relation to digital phantoms. This gap leaves underexplored how medical researchers validate their technologies, repair disconnections from material reality, and navigate truth claims tied to digital representations of the body.
Our aim is not to evaluate the technical correctness of digital phantoms or synthetic data. Rather, we analyse how researchers frame these tools as representative of reality, particularly when comparing synthetic outputs to real-world benchmarks. We explore how these digital artefacts are positioned as sources of ground truth and how they contribute to medical knowledge production, including the growing role of AI techniques in shaping these practices.
The next section provides a background on digital phantoms, synthetic data, ground truths, and the epistemological underpinnings of scientific knowledge-making from the perspectives of Science and Technology Studies (STS) and critical data studies. Section three outlines our document analysis methodology. Section four presents four thematic findings. We conclude by discussing how, despite their detachment from real subjects, synthetic data are increasingly framed as enhancing control, flexibility, and efficiency – allowing researchers to sidestep the ‘messiness’ of clinical data while still producing credible, actionable insights. We describe this in terms of the pursuit of a ‘known truth’: an approximation of reality considered sufficiently reliable to support the development of representative datasets and models.
Ground truths
The concept of ground truth is central to how digital phantoms are built and validated. Campagnolo et al. (2016) argue that their value lies in connection to a ‘known ground truth’, which lets researchers assess methods under test, for example checking image quality or simulating physiological properties of human organs or tissue. In computer science, a ground truth refers to a dataset representing the ‘true’ values of a phenomenon (Kang, 2023) and serving as reference points in machine learning, functioning as the training data from which algorithms derive their models of the world (Amoore, 2020) and which they are evaluated against.
Jaton (2021: 294) defines ground truth as establishing a relationship between inputs (images, text, audio) and labelled outputs. Crucially, ground truths are constructed, not found: they emerge from choices about how to define and represent a problem, what Jaton calls problematisation. This perspective invites scrutiny of how machine learning problems are framed and how their associated ground truths are assembled.
A growing body of social science literature examines the social, ethical, and epistemic dimensions of ground truths and the labour of data labelling. One stream looks at digital image processing (Jaton, 2017, 2021); another at algorithmic work with social media data (Amoore, 2020). More recently, attention has turned to medical AI, including the creation of ground truths in health-related contexts (Henriksen and Bechmann 2020; Högberg, 2025; Winter and Carusi, 2022) and biometric sound analysis (Kang, 2023). For instance, Kang (2023) proposes tracing ground truth development with emphasis on whether a given problem is learnable and whether there is a well-defined benchmark against which accuracy can be measured.
Both Jaton (2017) and Kang (2023) stress that in practice, ground truths are often flattened by which complex phenomena are simplified into workable mathematical references to train algorithms on. More social science research is needed to unpack what this flattening entails: how ground truths are put together, how algorithms learn from them, and which human judgements and compromises shape that process.
However, much of the existing literature still assumes ground truths come from empirical data. Less attention has been paid to synthetic sources, such as digital phantoms and synthetic data, as ground truth. Synthetic data, that is ‘artificially created’ rather than recorded events (Andrews, 2021), can also function as ground truth in development and testing, marking a shift away from expert-labelled, real-world datasets (Amoore, 2020; Jaton, 2017; Winter and Carusi, 2022). On the surface, synthetic and real data can look similar. Both support model training, and validation, hypothesis generation, and algorithm testing. Yet, the epistemic stakes differ, because synthetic ground truths lack a direct empirical anchor, an issue that remain underexplored and deserves closer attention.
Synthetic data in the social science literature
Until recently, discussions of synthetic data largely stayed within technoscientific communities (medical physics, computer science, engineering, robotics, and finance). Social science engagement has accelerated only in the last few years (e.g. Jacobsen, 2023; Steinhoff, 2022). For example, Steinhoff (2022) examines the political economy of synthetic data, asking how it reframes data-driven capitalism by shifting attention from surveillance of real subjects to synthesising data. This redirection raises questions about the ontology of data and the epistemic consequences of training models on simulated datasets rather than empirical inputs, including whether synthetic environments can faithfully emulate real-world conditions for truth claims.
Building on this, Jacobsen (2023) highlights how synthetic data are cast as solutions to two persistent challenges in machine learning: limited variability and data-related risks. They are praised for its injecting heterogeneity into training sets, covering typical cases as well as outliers and ‘edge cases’ (Jacobsen, 2023: 4–5) to improve robustness in deployment. Yet, Lee et al. (2025) show how outliers and edge cases are often removed during the generation of synthetic datasets, enacting an ontological normativity that normalises certain worlds while marginalising others.
Synthetic data are also frequently described as ‘beyond risk’ because they do not derive from identifiable individuals. The claim runs: no real data, no real risk. Jacobsen (2023) cautions against this view, arguing that is obscures the power relations and institutional dynamics shaping data practices. By treating training data as the only source of harm, such narratives neglect broader ethical, social, and organisational factors that configure algorithmic systems.
Scientific facts, objective truths, and real-world data
Our interest in digital phantoms, synthetic data, and algorithmic ground truths sits within a broader STS tradition that examines how scientific facts and technologies are socially constructed and stabilised. Classic studies show how technologies are shaped by human, social and political forces (Pinch and Bijker, 1984) and how, in practice, science is made through sociotechnical entanglements and epistemic cultures in which tools can act as credibility devices (Knorr-Cetina, 1999; Latour, 1987). STS also documents how social factors define what counts as scientific truth (Shapin, 1994) and traces the history of scientific ideals such as objectivity and trained judgement (Daston and Galison, 2007). Related work explores how medical normality and pathology are constructed as statistical entities (Canguilhem, 1989), how multiple ontologies of disease are coordinated in practice (Mol, 2002), and how inclusion and exclusion politics shape medical knowledge (Epstein, 2007). In short, scientific knowledge and technological artefacts are produced within human practices, social norms, institutional values, and material constraints.
While we sometimes contrast ‘real’ and ‘not real’ data, STS scholars challenge the very notion of ‘raw’ data. As Gitelman (2013) argues, there is no such thing as raw data; all data are mediated by instruments, decisions and interpretations. This is especially clear in medical imaging, where the outputs are not passive recordings but are actively constructed through technical settings, aesthetical choices, and clinical conventions (Beaulieu, 2002; Casini, 2021; Joyce, 2008). Increasingly, imaging pipelines feed directly into computational systems, prioritising machine legibility over visual interpretation.
Over the past two decades, STS has developed rich research programmes on algorithms and data practices (e.g. Vertesi and Ribes, 2019). This work provides detailed accounts of how models are produced in context and how decisions about inclusion, exclusion, validity, and uncertainty are negotiated. Recent studies focus on AI and ML in science and medicine, showing how knowledge emerges through collaboration, contestation, and iteration (Amoore, 2020; Jaton, 2021; Winter and Carusi, 2022). Jaton (2021: 12) introduces ‘algorithmic constitution’ to emphasise that algorithms do not simply represent an independent reality; they help bring it into being. Drawing on Actor-Network Theory, he argues the world is continually enacted through associations among human and non-human actors. In this view, ground truth is not a universal given reality, but rather a relational outcome of ongoing work among people, tools, and representations to stabilise what counts as valid or accurate knowledge.
Analytical approach
Building on the methodological stance proposed by Amoore et al. (2023), this study analyses academic literature on digital phantoms and synthetic medical data. Amoore et al. advocate for a mode of analysis that focuses on reading how technological artefacts are described and narrated, paying special attention to passages that reveal instability, ambiguity, or multiple meanings: Such passages are selected from a work not strictly because they are crucial to uncovering a definitive originary meaning, but because they contain moments when the author opens up an unresolved problem and signals the multiplicity and instability of meaning. (Amoore et al., 2023)
We conducted a document analysis of peer-reviewed, English-language articles. By a search in Web of Science of the terms ‘digital phantom’, ‘digital phantoms’, and ‘medicine’, we initially identified 466 peer-reviewed articles (no publication date limit). From these, we selected a set of 40 articles based on abstracts and brief reviews, prioritising work that explicitly discussed the development or application of digital phantoms together with synthetic data. We also included additional literature relevant to the themes of ground truths and realism in the context of phantom objects.
However, our goal is not to provide a comprehensive review of the field. Instead, we surface and interrogate specific lines of reasoning around digital phantoms, synthetic data and ground truths. A key limitation of this approach is that document analysis cannot access what happens ‘behind the scenes’ of research. These scholarly articles represent the polished outputs of scientific labour rather than the complex realities and practices of research-in-the-making. Even so, our analytical strategy offers valuable insight into how values are articulated around digital phantoms and synthetic data at a time when they are increasingly used for developing and testing medical AI. As such, this study lays the groundwork for further study of their epistemic and sociotechnical dimensions.
We analysed the material qualitatively using thematic analysis (Ryan and Bernard, 2003). Themes were identified through recurring statements about motivations, challenges, notions of realism, and comparisons between synthetic and clinical data. We engaged in joint interpretative discussions and through an inductive, iterative process developed four overarching themes: (1) Promises of phantom objects and synthetic medical data; (2) Creating Phantoms and Data; (3) Phantom Populations; and (4) A Better Truth. The next section presents our findings organised around these themes.
Findings
This section outlines four interrelated themes identified from our document analysis. First, we critically examine the promissory narratives about digital phantoms and synthetic medical data. Second, we discuss how these entities are understood in in terms of varying degrees of manipulation and generation in creating phantoms and data. Third, we consider how researchers conceptualise and construct ‘phantom populations’ and how ideals of data and bodies are propagated. Finally, we address the blurred boundary between ‘phantom truths’ and ‘real truths’, arguing that plausible realism sometimes claims more authority than authenticity.
The promises of digital phantoms and synthetic data in medicine
Across the literature, digital phantoms are framed as increasingly essential alongside AI's appetite for large, controllable training datasets. As Drobnjak et al. (2021) note in NeuroImage, the growing digitisation of research and the normalisation of AI are sharpening demand for physical and digital and numerical phantoms and simulations: In the future that is becoming more and more digital, data sets ever so larger and AI, with its need for highly controllable large ground truth training data sets, becoming a norm, the need and importance of physical and numerical phantoms and simulations will only grow. (Drobnjak et al., 2021: 17)
Research with real participants introduces various ethical and logistical limits, such as radiation exposure, discomfort, unforeseen health risks, and privacy concerns. Human subjects must also be recruited, scheduled, kept still during procedures, and accounted for in terms of varying physiological characteristics that must be compatible with specific medical equipment.
Historically, physical, solid anthropomorphic phantoms, objects that mimic human tissues and lesions helped mitigate some of these challenges, but they bring their own drawbacks: production costs, maintenance, storage requirements, and degradation over time. Digital phantoms avoid these issues: Compared to ex vivo measurements for obtaining the “ground truth” on tissue properties, digital phantoms do not suffer from the issue of tissue shrinkage, deformation, or any physical changes that may be caused by invasive or post-mortem procedures. (Wu et al., 2021)
Recruiting participants and ensuring ethical compliance are often described as ‘burdensome’, and many researchers argue that digital phantoms and simulations help ease these demands (e.g. Lowther et al., 2018). However, even proponents acknowledge that there are limitations. Lowther et al. (2018), for instance, describe digital phantoms as offering ‘virtual validation’ and adequate ground truth, while noting that tissue appearance and motion realism could be improved. Nevertheless, in cases such as modelling respiratory motion, digital phantoms are often judged superior to physical phantoms, especially when clinical trials with real human subjects are deemed ‘impractical and costly’ (e.g. Amin et al., 2019).
Digital phantoms are also often used to build synthetic populations for virtual clinical trials, proposed as alternatives where real-world trials would be unethical (e.g. involving radiation dosimetry). Beyond ethical concerns, some researchers argue that digital phantoms address technical constrains associated with real-world data acquisition: …the application of retrospective sorting methods on the CoMBAT phantom provided a validation approach and a reproducible strategy which are typically not possible on patient data, due to the absence of proper real-time 4D MRI and variability in patient breathing. (Paganelli et al., 2018)
Much as physical phantoms have long been used in training medical staff (Johnson, 2004), digital phantoms now feature in training both humans, such as radiologists and pathologists, and machines. They increasingly underpin training of algorithms, aligning with broader ambitions surrounding AI, ML, and deep learning. Moreover, digital phantoms can serve as the basis for GANs to create synthetic datasets representing human bodies and organs. This promise a way around reliance on so-called ‘raw data’, even though that is also constructed and shaped (Gitelman, 2013). Importantly, there are multiple ways to model digital phantoms, and they are engaged in a wide spectrum of data generation and algorithm-training activities.
Making phantoms and data – degrees of generation and manipulation
Research on digital phantoms spans across production processes, technical complexity, and fabrication techniques. Here we highlight how digital phantoms and synthetic data are generated, and how the line blurs between what counts as ‘real’ data and what is digitally produced.
With advances in computational technologies, digital phantoms have grown markedly more complex – and are perceived as increasingly anatomically realistic. A notable early example is the Shepp-Logan 2D brain phantom model, released in 1974 (Figure 1). Built from 10 ellipses of different sizes and signal intensities (shades of grey), it was designed to mimic the geometry and x-ray attenuation properties of the human head (Gach et al., 2008).

The Shepp-Logan 2D brain phantom model representing the human brain.
When constructing a digital phantom, several factors must be taken into account. Kainz et al. (2019) list key considerations including ‘anatomy, tissue properties, computational efficiency, and geometrical compatibility with simulation codes, e.g., Monte Carlo (MC) or analytical’. Generation typically begins by specifying the properties of the relevant tissues and surfaces where interactions occur. Two main modelling methods are commonly used. Constructive Solid Geometry creates solids from quadratic equations or voxels. Medical image data can be converted into voxel-based geometries, ‘providing a direct way of realistically describing the human anatomy’ (Kainz et al., 2019). Automation now helps map image values into voxel tissue properties. A limitation is the stair-step artefact from cubic voxels, with anatomical fidelity depending on voxel size, which is problematic for modelling very thin or small tissues (Kainz et al., 2019).
The second approach, boundary representation, uses Nonuniform Rational B-Splines (NURBS) or mesh phantoms. It extracts the surface contours of each organ to create smooth, anatomically realistic shapes that can be assembled to represent full or partial human body anatomies: ‘In essence, the contours convert the voxels into NURBS that are smooth and anatomically realistic’ (Kainz et al., 2019).
To further boost realism, morphing and posing methods are used to adjust volumes and shapes of reference phantoms to reflect individual patients or to generate sets of anatomically diverse datasets. This can involve geometric scaling based on statistical properties, physics-based methods (e.g. biomechanical tissue deformation models) or image registration techniques that map properties to reference images (e.g. CT scans). Some barriers remain, for example, using NURBS phantoms in Monte Carlo simulations is often too complex, requiring a process that converts them back into voxel models and resulting in losing details in thin structures (Kainz et al., 2019).
Modelling motion of organs and fluids is also crucial. The 4D XCAT phantom, derived from CT data using NURBS, includes a beating heart and respiratory motion – features essential for algorithm development in medical imagining and targeted treatment (e.g. Huh et al., 2022). For MRI-specific phantoms, Drobjnak et al. (2021) compare physical anatomical phantoms with digital computational ones and describe the latter as a linked set of components: (1) A structural model defining the simulated tissue, e.g., the cell shapes and types or the fiber configuration, (2) a diffusion model describing the water diffusion in the structural model that determines the signal attenuation in the diffusion-weighted signal, and (3) some sort of algorithm that enables the simulation of MRI signals and/or images on the basis of the other two components. Depending on who you talk to, probably the signals or images that are simulated using these components are also denoted ‘phantom’. (Drobnjak et al., 2021: 8) Simulation could circumvent the need for human labeling by producing realistic datasets, along with ground-truth labels, for training machine learning tools on. In the case of QC [quality control], a simulator that was capable of producing datasets containing artefacts, such as motion, could be used to produce a training set. (Drobjnak et al., 2021: 17) The generated labels represent an accurate ground truth, can be rapidly built, and grant additional flexibility since the anatomical models providing the ground truth can be automatically adjusted as required. By eliminating or reducing labelling requirements, the proposed pipeline enables greatly accelerated deep learning algorithm development in cardiac imaging. (Gilbert et al., 2021)

Illustration of GANs used on anatomical models to create datasets. “Using anatomical models as high-quality ground truth annotations, we propose a pipeline to generate large synthetic datasets for training convolutional neural networks.”
Crucially, that detachment is a matter of degree rather than a hard boundary (Steinhoff, 2022), ranging from basic data augmentation techniques to fully synthetic datasets created in simulated environments (e.g. simulated traffic events to support training of algorithms for autonomous vehicles). As the examples above show, phantom objects and digital simulations, often sit between real-world human data and algorithmically generated content. Many phantoms are based, either partially or wholly, on medical imaging data from actual patients or research participants. This further complicates the notion of data as ever being raw (Gitelman, 2013) and raises questions about thresholds: when is data considered generated or fabricated, and why are some forms of imputation or simulation acceptable (for certain types of studies) while others are not?
Phantom populations
‘Old school’ digital phantoms have been criticised for offering a narrow view of human variation. Newer approaches are promoted as equalising tools, enabling the creation of more representative models of human anatomy, including groups typically underrepresented in ‘real world’ clinical datasets.
Digital phantoms and simulations are often built from data on specific individuals and have contributed to standardising certain body and organ types. For example, brain phantoms have frequently been constructed using data from just one or a few individuals. Although this constraint is acknowledged, it is often not seen as a barrier so long as the anatomy is considered ‘typical’ (Olson et al., 2018). Yet, the same authors note that such phantoms may not generalise to children, older adults, or people with neurological disorders, whose characteristics may have ‘values outside’ the phantom's range (Olson et al., 2018). By leaning on notions of the ‘typical’, these digital artefacts can uphold and reinforce normative assumptions about bodies and organs.
A similar pattern appears in torso models. The 4D NCAT phantom was created using NURBS surfaces from data on a male cadaver in the Visible Human Project, and a female version was produced by simply adding ‘breast surfaces to the base male torso’ (Segars et al., 2010: 4903). That early version covered only the torso region and lacked high-resolution anatomical detail. To address this, researchers developed the 4D extended cardiac-torso (XCAT) phantom, which includes ‘highly detailed whole-body male and female anatomies and improved models for the cardiac and respiratory motions based on state-of-the-art high-resolution imaging data’ (Segars et al., 2010: 4903). Using a mapping algorithm, the XCAT developers then created a library of 58 anatomically varied phantoms (35 male, 23 female) (Segars et al., 2013).
Beyond the XCAT library, the ‘IT’IS Virtual Family’ provides a set of anatomical models developed from ‘MRI data of healthy male and female adults and children of various ages, an obese male, an elderly male, a pregnant woman, and newborn’ (Kainz et al., 2019). These models are posable and morphable for further customisation. Other examples include 12 voxel phantoms from the German Research Center for Environmental Health (based on CT data from living patients), and a model of an eight-week-old infant model derived from post-mortem data. The University of Florida has released a ‘family of models’, and Hanyang University in Korea has produced a ‘high-definition reference Korean man’ (built over seven years from cryosection data) along with a ‘high-definition reference Korean woman’ (Kainz et al., 2019). Still, many researchers argue that existing phantoms remain limited by their origin in a small number of individuals and call for wider anatomical variation: …current existing digital phantoms such like Zubal phantom and XCAT phantoms are usually generated from a single person's anatomy and lack anatomical variations present on a population level. Thus, there is an urgent need to develop a large population of digital phantoms that model anatomy variations seen in clinic. (Shao et al., 2022)
With personalised medicine on the rise and given the importance of individual-specific features for applications like radiation dosimetry, there is a growing interest in tailoring phantoms to reflect patient-specific characteristics. Techniques include morphing (e.g. mapping CT images to existing phantoms to simulate personalised anatomy; Kainz et al., 2019) and adjustable parameters, as demonstrated in a brain phantom customised by modifying variables such as stroke type, areas of damaged tissue, contrast agent protocols and CT scan settings (Divel et al., 2016). In nuclear medicine imaging, small real-world datasets with unknown ground truths risk overfitting machine learning models (Shao et al., 2022). To counter this, Shao et al. (2022) use such datasets to train GANs, generating larger and more diverse datasets aiming to improve robustness.
At the same time, projects that diversify phantom population can still rest on gendered, racial and ableist normative ideas about bodies. For instance, who is represented by the Korean man, the University of Florida's family of reference objects or models of the ‘neurotypical’ brain? These digital objects enact such normativities, echoing longstanding negotiations around classifications and standardizations in medicine (e.g. Bowker and Star, 1999; Ichikawa et al., 2025).
In these studies, representation is not just a technical task, but also treated as a matter of realism. The push for more varied phantom libraries is ultimately a pursuit of a ‘better truth’. On one hand, current phantoms are viewed as insufficiently diverse; on the other, varied phantom libraries are used as ground truth and training data to enhance representation in algorithm development and validation. This amounts to digitally induced diversity aimed at reducing bias. In medicine, this may include induced pathology, generating sufficient cases of rare or adverse conditions to train algorithms. By scaling up edge cases, we see how phantom objects and synthetic data are positioned as tools for inclusion (Jacobsen, 2023). In the medical context, this must be read within wider debates about who is represented in medical research, what data are missing and how to tailor medicine to individual characteristics and conditions (Epstein, 2007). These phantom populations are created through different degrees of fabrication, augmentation, and reference to real bodies. Still, a central question remains: how closely do these artificial models reflect real populations, and how well do they translate into clinical practice?
A better truth: realism versus authenticity
In this section, we examine how the researchers in our material reason about what digital objects, such as synthetic data and digital phantoms, can reveal about real-world phenomena, and how these tools create tensions between clinical realities and digital assumptions. A headline from IEEE Spectrum provocatively asks, ‘Are you still using real data to train your AI?’, arguing that synthetic data could make AI systems both more effective and more ethical (Strickland, 2022).
While real-world clinical data have long been considered the gold standard in medicine (Timmermans and Berg, 2003), they are increasingly viewed as limited; messy, scarce, subjective, and potentially biased. A reoccurring challenge is the absence of a ‘known truth’, for which synthetic data and digital phantoms are presented as viable, even advantageous, solutions. In this context, a ‘known’ truth is often framed as preferable to real data: Our study has a few limitations. First, we investigated the accuracy of the registration algorithms with synthetic digital phantom images instead of real patient images. However, this gave us the major advantage that we have a known ground truth for each voxel, which is impossible with real patient data. Therefore, this is the only method possible to investigate, objectively, the performance of each registration algorithm. (Grob et al., 2019)
Some researchers also acknowledge limits, especially around generalisability. In a practical and pragmatic vein, Aubert-Broche et al. (2006) describe digital phantoms as ‘anatomically realistic’, yet derived from a ‘restricted set of data: normal young adults, male and female’, noting that their simulations work ‘on the assumption that the phantoms are the ground truth and define truth in the simulations’. Avoiding the messiness of real-world data can also introduce oversimplifications that undermine evaluation. Liu et al. (2020) caution that overly ‘simplistic’ images may affect performance measures and realism. This concern relates to the challenge of applying models trained on synthetic data to real-world domains. One proposed remedy is ‘domain randomisation’, which aims to bridge the so-called ‘reality gap’ between artificial and real data (Tremblay et al., 2018).
The entanglement of synthetic data, real-world subjects, and artificial objects complicates the notion of a strict disconnection. The ideal of realism of scientific data stem from the goal of truth-to-nature representations (Daston and Galison, 2007). It also reflects how the technical solutions are operating within persistent assumptions about the healthy ‘normal’ body versus the pathological body as clearly separable and statistically distinct categories, an idea long problematised (e.g. Canguilhem, 1989). Scientific ‘known truths’ are often highly controlled, idealised depictions of plausible realities. By contrast, clinical datasets are seen as authentic – but flawed – shaped by human error, subjectivity, socioeconomic disparities, discriminatory practices, and the limits of imaging technologies (Joyce, 2008). Thus, what is most realistic is not necessarily the most authentic in its relationship to human subjects.
At the same time, synthetic and augmented data allow for the introduction of controlled messiness, such as simulating image quality variability across vendors, contrast levels, or scan resolutions, to make algorithms more robust in deployment. Some researchers go further, suggesting that generated artefacts should become the gold standard, surpassing human-labelled datasets by offering a ‘known truth’ free of human bias. Indeed, Drobjnak et al. (2021) make a strong case for digital phantoms as the ‘perfect ground truth’ in diffusion MRI: The most important distinctive feature of digital phantoms is that they are the only way to obtain dMRI data with a real ground truth. Even well-defined physical phantoms cannot provide such a perfect ground truth since a direct correspondence between measured signal and component of the phantom is not given and the different aspects that define the phantom itself can only be controlled to a certain extent, e.g., due to mechanical limitations or the statistical nature of the diffusion process. Digital phantoms on the other hand are fully controllable and each aspect of the resulting MR signal can be explained by the phantom's components. (Drobnjak et al., 2021)
Concluding discussion
The knowledge-making processes surrounding medical digital phantoms and synthetic medical data are tightly interwoven. These digital artefacts train both human professionals and AI systems, anchor data production pipelines, and are seen as crucial for scaling small datasets or filling gaps in data, whether in terms of human variation, organ types, medical conditions, imaging technologies, or image qualities. In this study, we have examined how researchers construct ground truths and the challenges they encounter when advancing truth claims through technical work. Our document analysis identified four themes that informs about how digital phantoms and synthetic data are used, imagined and motivated as ground truths for medical physics. First, we address the promises of phantom objects and synthetic medical data as enablers of a controlled truth for training and evaluating algorithms and AI models. Second, we attend to the different practices and modes of how phantoms and data are created, showing the degrees of generation and manipulation. Third, we show how digital phantoms and synthetic data are imagined as able to create diversity in data by such as variability of phantom populations. Lastly, we identify how digital phantoms and synthetic data are claimed as offering a better truth than real-world clinical data.
Broadly, digital phantoms and synthetic data are presented as ethically and practically superior to real-world data, because they are detached from identifiable human subjects and thus avoid many privacy, consent, or harm concerns. They are framed as providing, what they call, ‘known truths’, that enable enhanced control and reliability. Synthetic data has recently gained attention as a revolutionary technological development, yet, its conceptual novelty merits scrutiny. Established statistical practices such as data imputation – where missing values are substituted with plausible estimates – also generate values that are not directly observed. These are, after all, generated values that are not ‘real’, but statistically inferred. The key difference to why they are judged differently may lie in scope and technique: data imputation fills gaps, whereas synthetic data can replace entire datasets. Similarly, digital phantoms vary in how closely they maintain ties to real-world references. What counts as valid fabrication or simulation is historically and socially shaped. These various notions of what modes of construction of data that is considered scientifically valid, and unproblematic, are current iterations of how scientific credibility is negotiated (Daston and Galison, 2007), and influences what becomes accepted practice in different knowledge cultures (Knorr-Cetina, 1999). Are we witnessing a shift in which kinds of data-making and use are seen as legitimate and trusted?
Our material shows increasingly blurred boundaries between artificial and empirical data. In contrast to Henriksen and Bechmann (2020) who find that ground truths typically reproduce existing clinical norms, we also see cases where digital phantoms are described as the only, or at least most feasible, route to a ‘real ground truth’. Here, digital phantoms and synthetic data are seen as contributing to a particular type of ground truth, one that is redefined as explicitly known, explainable, and controllable, rather than something rooted in clinical experience, reflecting frustration with clinical data's representation bias, subjectivity, and constraints of documentation or patient trajectories.
A growing discourse is emerging around the idea of bypassing real-world clinical altogether, through simulations, phantoms, synthetic, or augmented data, to develop more ‘ethical’ or ‘better’ AI. In doing so, we may be witnessing the rise of an epistemic culture where real-world origins are no longer the most valued characteristic of data. Instead, value is increasingly ascribed to artefacts that best simulate plausible realities in a controlled way. As Engelmann (2022) notes, with the rise of big data, sources originating outside traditional clinical domains are now accepted as valid grounds for medical truth claims.
Following Kang (2023) on tracing ground truth creation, and Jaton (2021) on how we get the algorithms of our ground truths, we suggest we are also getting the ground truths of our algorithms, or more precisely, of our generative models. This invites us to reconsider the idea of ‘raw data’ as something naturally occurring, when in fact, as Gitelman (2013) argued, all data are to varying degrees made. While the ground truths used for training and evaluating AI models are always assembled and constructed by multiple choices, technical equipment, infrastructures, conditions and medical epistemologies (Högberg, 2025; Jaton, 2021), the use of simulations and synthetic data as ground truths entails yet an additional separation from the clinical and empirical medical reality. They are made to be adaptable and controllable, rather than the direct representational links that traditional empirical data are made to pursue. The digital artefacts of this analysis are not only representational; they are performative, shaping reality through the algorithms, models, and practices they enable. Their sociomaterial dimensions matter for how they are stabilised as epistemic truths, much like aesthetic and material choices shape medical imaging (Casini, 2021). As Lee et al. (2025) argue, these AI-generated data and digital phantoms enact particular worlds and make certain realities possible.
In this study, we observe a continuum from real to artificial, from clinical bodies to synthetic representations used in AI development and validation, affecting ontologies of body, health and disease (Mol, 2002). If data can be generated or tuned to better reflect a target population or condition, what counts as the best representation of reality, and what role should authenticity play in judging scientific or clinical validity? As Steinhoff (2022: 9–10) reminds us, models trained on synthetic data must ultimately prove themselves in real-world settings. This highlights the need for thorough real-world validation of algorithms and models and critical attention to how medical ground truths for AI are constructed and how models are evaluated.
In conclusion, we identify a set of themes that sheds light on current interlinked dynamics among digital medical phantoms, AI technologies, synthetic medical data and ground truths for medical AI. Our analysis contributes to emerging work in STS, critical data studies and medical sociology. The digital artefacts of this study complicate the boundaries between what is considered real and what is constructed, adding to longstanding critical discussions about how science is made. We have shown how phantoms and synthetic data are articulated and made into scientific truths. Their appeal lies in adaptability and controllability, qualities that let researchers define and refine ground truths to fit modelling needs – speaking directly to the power of synthetic data (Jacobsen, 2023). As these artefacts rise in prominence, they sharpen a core tension between the ‘messiness’ of real-world clinical data and the desire for a ‘perfect ground truth’ – a truth that is not discovered, but manufactured.
Footnotes
Acknowledgements
The authors would like to thank the editors and the anonymous reviewers for thorough and constructive feedback.
Ethics approval and informed consent statements
Not applicable.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
The empirical material consists of published research articles that are either already available open access or for which sharing is restricted by copyright.
