Sage Journals: Discover world-class research

Abstract

Data interoperability poses unique ethical challenges across a range of academic, industrial, and governmental implementations of data systems. Central to data interoperability is the design of systems and protocols for exchanging or integrating data from different initial source domains. Data interoperability is often regarded as necessary for carrying out tasks between different organizations and suborganizations as well as for ensuring secondary use of data for research purposes. However, interoperability poses a number of ethical problems whose contours can prove especially challenging in comparison to how ethical harms take hold at other moments of the data life cycle (such as algorithmic processing or results dissemination). Taking biomedical data interoperability as a focal domain, this article provides an overview of data interoperability, maps the central ethical harms that may challenge interoperability projects, and proposes a response to these problems through an approach rooted in philosophical pragmatism. Pragmatist responses to both individual and structural harms of interoperability are presented through three companion strategies: shared standards, manual data curation, and meticulous data documentation.

Keywords

Data interoperability data ethics data integration data exchange biomedical data pragmatism

Introduction

The unprecedented expansion of data collection has introduced a host of advantages in industrial, governmental, and research contexts. In healthcare settings the expanding use of electronic health records and other forms of electronic documentation have increased the rate and quality of information exchange between healthcare providers compared to older physical formats (Kuiler and McNeely, 2018: 164). However, the advantages of digital data formats are not unproblematic. Efficiencies in the storage of digital data increase the possibility of future misuse (Sula, 2016: 20), and algorithmic techniques for sorting and classifying data can lead to the introduction of bias (Fazelpour and Danks, 2021). A third issue concerns problems flowing from the disparate data structures operationalized in separate storage systems or platforms. In settings where separate data systems need to exchange information or functions with each other, differences in data formatting pose problems related to the quality of data exchange, and these quality issues in turn generate novel possibilities for data misuse and bias.

This third issue is that of data interoperability, which broadly refers to processes implementing data exchange between two or more systems (Heiler, 1995: 271; Pagano et al., 2013: 19). A recent meta-review of scholarship on health data interoperability declares that “the issue of interoperability has grown exponentially in recent years and has attracted the attention of researchers around the world” (Torab-Miandoab et al., 2023: 3). Even with increased research on the benefits and challenges of data interoperability, there remains much work to be done exploring the ethical harms occasioned by the pursuit of interoperability. The meta-review concludes that, “despite the importance of the interoperability of health information systems, the various approaches taken to create it, and the growing academic interest in it, this issue is currently in a state of disarray” (Torab-Miandoab et al., 2023: 2). If research on data interoperability is indeed in disarray, the relative lack of attention to the specifically ethical challenges of data interoperability only makes matters worse.

How data are digitally organized and curated can generate significant ethical harms both individually in relation to a data subject's control over their data, and structurally in relation to data-informed decision-making about broad social problems differentially affecting specific groups or populations. Such ethical concerns about interoperability have assumed greater prominence in the wake of the massive leveraging of healthcare and biomedical data in COVID-19 epidemiology. Individually, potential violations of data privacy were exacerbated where multiple data-gathering organizations began to collate information in ways that led to a feeling of eerie overreach. Structurally, the lack of effective data interoperability across institutions hindered epidemiological data-gathering and the targeted public health measures dependent on data analytics. Proposals to improve interoperability reveal an underlying tension between these two types of harms. Although increased data availability and access may reduce social-structural harms, they may also generate potential violations of individual rights to data autonomy. Conversely, policy initiatives designed to protect individual data rights may impede secondary data use in a way that generates or perpetuates social-structural harms. Thus, interoperability can simultaneously mitigate and exacerbate harms at different individual and structural levels in a manner that requires an acute analysis.

Interoperability is a particularly pressing concern within healthcare and biomedical domains where both patient care and medical research privilege data longevity and reuse while also posing proximate ethical challenges for the individual and wider populations. For clinical practice, interoperability is a major aspect in relation to health information exchange, especially as it is implemented through electronic medical records (EMRs) and electronic health records (EHRs) (Berryman et al., 2013: 85). EMRs and EHRs were specifically designed with interoperability in mind to ensure easy exchange of patient data between different healthcare providers (Berryman et al., 2013: 86). In Europe, standards set by Health Level 7 (HL7) have attempted to create common guidelines for interoperability, but in the United States, the slow and scattered uptake of EHRs in the first decade of the 21st century required additional legislation in the form of the HITECH Act of 2009, which attempted to financially incentivize the adoption of interoperable EHRs among healthcare systems (Berryman et al., 2013). Still, due to scattered EHR systems across hospital systems, there remains a lack of standard coding for diseases and other medical terminology. The code “MS” may mean “mitral stenosis,” “multiple sclerosis,” “morphine sulfate,” or “magnesium sulfate” depending upon the system and the particular coder, leading to issues in interpretability, data loss, and inaccuracy (Chute, 2005: 170; Hoffman and Podgurski, 2013: 57). Interoperability harms can also occur where disparate data systems store data in formats of differing resolution, for instance where data need to be merged between a coarse-grained taxonomy and a finer-grained taxonomy. When fine-grained data are passed into coarser-grained databases (or vice versa), each data point needs to be resolved into broader (or narrower) categories. When operated at scale across populations, these seemingly minor technical issues can quickly add up to inaccurate estimations about individuals within larger groups. This can lead in turn to substantial inequalities.

The scale of the COVID-19 pandemic highlighted such harms relative to the secondary use of data for epidemiological purposes. While initial concerns about secondary data use were associated with “-omics” research and patient registries, especially for uses of genomic data (Nicholson and Perego, 2020; Wang and He, 2021), COVID-19 also brought into view the extensive barriers to the exchange of information between clinical settings and public health agencies (Subbian et al., 2021). As a result of these challenges, a new wave of research into interoperability emphasizing the importance of maintaining interoperable systems for emergency health initiatives related to the tracking and maintenance of novel diseases (Naudé and Vinuesa, 2021; Pelizza, 2020; Piller, 2020). This research not only extends the discussion of interoperability to new areas, but it also problematizes a common reliance on individual-oriented frameworks that tend to emphasize rights to privacy and autonomy at the cost of neglecting both the broader menu of social goods that can be generated from secondary uses of health information as well as the negative effects flowing from insufficiently interoperable data systems.

Taking a broader view of ethical concerns as involving both individual- and structure-oriented harms, this article discusses data interoperability as a distinctive sociotechnical operation within which many of the most well-documented problems of data ethics manifest in ways that are different from how they occur in settings that do not implement interoperability. There is now a rich literature on data ethics (Mittelstadt and Floridi, 2016). Central to this scholarship are concerns about information privacy, data discrimination, and digital divides of unequal access to quality information and to computer technology. Much of the literature on these issues focuses on the technical processes through which ethical harms might manifest. The most prominent technical processes under focus in the existing literature are algorithmic data analytics.¹ We argue that data interoperability ethics requires reflection that moves beyond the most familiar sociotechnical moments in the data life cycle, most prominently algorithmic processing, to the barriers imposed by the prior implementation of what we describe as the formats of data.

The concept of formatting, developed by one of us in prior work, refers to the manifold processes by which data are defined as such and information is formed as such.² Formats are that which makes data and information possible. Consider a cell in a spreadsheet file. A cell can store a data point only against background conditions of formatting: these include technical conditions of data typing (specifying whether the contents of the cell to be stored as a numerical integer, a text string, or a calendrical date) and conceptual or semantic conditions defining permissible values for a variable (e.g. a defined range of permitted options for a list-selected variable whose column header is “gender”). When not subjected to thorough investigation, the ensembles of formats wielded in data interoperability projects can lead to the generation, entrenchment, and escalation of unexpected harms at both the individual and structural levels.

The first and more diagnostic contribution of the article consists of a conceptualization and mapping of these unexpected ethical harms with a particular focus on making the underlying practices of data formatting more explicit. Said otherwise, we show how the harms inhering in data interoperability often result from an unreflective inattentiveness to how data are formatted. An important implication of this argument is that even some of the most familiar harms of data systems prove uniquely challenging when occasioned by interoperability in contrast to the ways these harms manifest in other, more frequently analyzed, moments of the data life cycle.

As a response to the specific ethical problems raised by data interoperability, the second contribution of the article is more positive, or curative, in its aims. We propose a trio of strategies, all of which are rooted in a form of sociotechnical analysis recommended by philosophical pragmatism. Our pragmatist approach addresses both structural and individual harms by focusing on how data practitioners and institutional stakeholders can strategically implement more reflective data practices in the face of the inherent unpredictability of future occasions of data use.

The article proceeds as follows. In the first section (“An overview of data interoperability”), we provide a general overview of data interoperability as a problem in data science and informatics, paying careful attention to its role in ensuring successful data integration and data exchange. In the second section (“Mapping the ethical landscape of data interoperability”), we map particular harms associated with the formats of data interoperability in relation to two leading frameworks for data ethics (both of which are also prominent in the biomedical ethics literature): those concerning individual rights (and associated notions of privacy, autonomy, and dignity) and those focusing on social structures (such as issues of justice, fairness, and equality). In the third section (“A pragmatist approach to achieving data interoperability”), we introduce our pragmatist approach to problems of data interoperability which addresses harms from both frameworks by attending to the reflective practices that data practitioners and institutional stakeholders can implement to render data most adaptable to new situations.

Before proceeding to these three elements of our analysis, it will be valuable to consider the generalizability of the analyses that follow in light of our central focus on biomedical data. The general ethical challenges mapped in this article are of increasing salience as data interoperability is becoming ever more important for a broad range of institutions and industries across education, business, governmental, and various healthcare and research-related domains. Consider education data technology. There is now a widespread use of data systems in education for such purposes as data exchange between administrative data systems, classroom management tools, learning analytics software, and learning assessment platforms both within and between individual educational institutions (Daniel, 2017; Santos et al., 2016; Wong et al., 2023). We believe that the ethical concerns raised by interoperability in biomedical data are indeed generalizable to other domains where interoperability has become increasingly essential.³ That said, every domain in which data interoperability poses challenges will face those challenges in relatively unique domain-specific ways. Though the mapping and strategies we provide for healthcare data interoperability is one we take to be broadly applicable to other domains, applicability cannot itself be assumed to be self-guiding. Generalization, especially with respect to ethical challenges, requires nuanced interpretation and reparticularization in every different domain of application.

An overview of data interoperability

If we conceive of data interoperability as the ability to exchange information and functions between different data systems or platforms, an initial question arises as to how interoperability should be conceptualized in relation to similar concepts in data science. Following Pagano and colleagues (2013) we distinguish between three tasks that can be carried out on data and the conditions which must be met in order to carry out those tasks. First, data integration refers to the task of gathering data from one or more data sources and combining them into a target source. Next, data exchange refers to the movement of data from a source domain to a target domain (Pagano et al., 2013: 19). Third, data interoperability refers to that which is required for “realising data integration and data exchange as well as enabling effective use of the data that become available” (Pagano et al., 2013: 19). Under this set of distinctions, interoperability concerns “the enabling infrastructural layer that makes use of standards and protocols to allow the exchange of data” (Pelizza, 2016: 300). Interoperability thus denotes the conditions of possibility which allow for successful integration or exchange. However, since achieving data interoperability results in successful implementation of either integration or exchange, it is often used synecdochally as a broader category encompassing both functions.⁴

Another definitional issue arises when we consider the complexity of the sociotechnical systems in which problems of data interoperability arise. In order to clarify the processes and practices surrounding data interoperability, this idea can be organized according to three levels of abstraction: (1) technical interoperability, (2) semantic interoperability, and (3) organizational interoperability (Hellberg and Grönlund, 2013; Pagano et al., 2013; Shrivastava et al., 2021).⁵ At the lowest level of abstraction, technical interoperability refers to the hardware or software which enacts data functions (Pagano et al., 2013: 19; Shrivastava et al., 2021: 2). An example would be customized software that enables two different patient data systems used by a healthcare organization to interoperate (such as a clinical records database and a billing database). In a middle range, semantic interoperability refers to the meaning of the categories used to classify and represent the data (Heiler, 1995: 271). This level includes the categories themselves as well as the shared contexts and vocabularies in which they are situated (Pagano et al., 2013: 19; Shrivastava et al., 2021: 2). For instance, the differing use conditions for the code “MS” in medical settings (as noted in the introduction above) concerns implicit agreements among actors which may be independent of the limitations imposed by the hardware or software. Finally, at the highest level of abstraction is organizational interoperability, which refers to the standards, rules, and processes used within organizations to ensure adequate deployment of services across a range of departments or suborganizations as well as a range of geographical or cultural contexts (Kuiler and McNeely; 2018: 163; Pagano et al., 2013: 19; Shrivastava et al., 2021: 2). This level may involve conflicts between different aims or values of organizations as well as influences from the broader social, cultural, or political environment in which technical or semantic issues arise (Hellberg and Grönlund, 2013: 155). For example, regulatory approaches to data interoperability concern conflicts between private and public stakeholders which are neither reducible to the technical infrastructure on which data are handled nor to semantic agreements about the meaning of data.

The formats constituting conditions of possibility for data are sites at which relations between technical, semantic, and organizational constraints on interoperability are negotiated. Once operational, formats express and enact the outcomes of those negotiations. Formats as the operational outcomes of these negotiations can thus be highly variable across different data systems. Format variability is sometimes due to technical constraints (e.g. incompatible technological infrastructure), other times due to semantic constraints (e.g. contrastive coding systems), and yet other times due to organizational constraints (e.g. local institutional requirements mandating the use of a particular data schema). Such variability in formats, as well as the many possible causes of this variable, is one way of understanding the complexity involved in any effort at data interoperability.

Problems in semantic interoperability

One of the most pressing issues raised in contexts of data interoperability involves a mismatching of the purposes and contexts of data use. This is most clearly exemplified in cases of semantic interoperability. For example, in an early paper on the topic, Heiler notes that semantic interoperability is fundamentally grounded in “semantic agreements” between a requester and provider of information concerning the meaning of a particular term or category (Heiler, 1995: 271). These agreements can be difficult to establish when old data are set to new purposes. For instance, Heiler documents a project by the U.S. Department of Defense employing a database of military personnel addresses to establish where new veteran's hospitals should be located (Heiler, 1995: 271). Although initially the secondary use of these data appeared relevant for these purposes, it was later discovered that the addresses corresponded to active military assignments, including temporary assignments, and therefore, the data was useless at capturing where veterans and their families lived after their respective assignments. In this case, a single data field of “address” contained implicit semantic information which needed to be made explicit within the metadata in order to prevent future misunderstandings. However, Heiler further explains that it can be difficult to make semantic agreements explicit given that the information of interest will always be context-dependent, requiring documenters to hypothesize various future applications (Heiler, 1995: 272).

There are, however, methods which can be used to predict the likelihood of a semantic disagreement. For instance, Brazhnik and Jones (2007) have developed a set of concepts which can be helpful for determining the long-term reliability of categories for secondary data use. Concerning data elements (DEs), they distinguish between focal DEs and peripheral DEs, where focal elements are mandatory to the intended purposes for the data and peripheral elements are optional (Brazhnik and Jones, 2007: 258).⁶ They then argue that “meaningful integration may only occur between sources with a shared pool of focal DEs” (Brazhnik and Jones, 2007: 259). For example, corporate healthcare settings largely privilege information related to billing and payment such that these pieces of information are likely to be focal DEs with a high degree of reliability.⁷ It follows that deprioritized patient information, such as data relating to patient background and possibly even symptoms, may be unreliably recorded. Race and ethnicity, in particular, when recorded through a “pick-one” mechanism, may not be accurate for many biomedical purposes since mixed-race individuals would not be correctly documented (Brazhnik and Jones, 2007: 259–260). In healthcare settings, such semantic issues are exacerbated by the fact that race and ethnicity are not just unreliably recorded but also highly underreported in patient records (Ng et al., 2017).

Additional instances of human decision-making may also impart unreliability in relation to peripheral DEs. For example, a DE like “flu” may be coded in healthcare records to account for both vaccination and diagnosis due to the difficulty of memorizing discrete codes for each (Brazhnik and Jones, 2007: 258). Pine has discussed such decisions as a “qualculative” aspect of interoperability which “sees judgment and calculation as inherently related—calculation is not straightforward and mechanical, it involves situated qualitative judgments that are inherently quite effortful” (Pine, 2019: 539). In Pine's human-centered account of semantic interoperability, data recording practices which may be viewed as unreliable or error-prone by researchers are actually part of complex, pragmatic social negotiations which are difficult to correct through technical means alone. For example, if discrepancies occur between the primary doctor's description of treatment and a hospital's discharge summary for a respective patient, the coder in charge of the patient's chart will likely choose to code in conformity with the discharge summary instead of backtracking to determine the reliability of each respective account (Pine, 2019: 542). In Pine's analysis, such a decision is not a purposive failure to record accurately, but a decision to record as accurately as possible given the economic and temporal constrains of the coder's workflow.

Finally, semantic issues can result due to implicit standards and formatting, specifically between units of measurement for a recorded variable or for the order of terms within dates (Brazhnik and Jones, 2007: 262). For example, a weight recorded as “120” or a date as “10-09” has different meaning in the United States and Germany. There can be barriers to interoperability even in instances where focal DEs have been reliably recorded. Although these cases may pose issues for automated data integration, contextual information within the data often provide clues as to the correct order of terms and details about the cultural, historical, and geographic conditions of collection can help to standardize measurement variables. Yet this contextual information will not always provide a satisfactory interpretation of ambiguous DEs. In many instances, missing data cannot be interpolated or imputed, and these data may contribute to data loss and bias (Bradwell et al., 2022: 1173). The result is that even relatively trivial exclusions may pose challenges to data integration and exchange, especially when one considers the time-consuming nature of manual data cleansing for large-scale datasets.

Mapping the ethical landscape of data interoperability: Individual-oriented versus structure-oriented frameworks

With a mapping of data interoperability in view, we now turn to the ethical features of the landscape we have mapped. This section develops a further layer for analyzing data interoperability. The ethical problems posed by data interoperability can be introduced according to principlist ethical frameworks proposed for general use within both data ethics and biomedical ethics. Mittelstadt and Floridi (2016) outline the literature concerning the ethics of big data in biomedicine as centering upon five areas: (1) informed consent, (2) privacy (including anonymization), (3) ownership, (4) epistemology and objectivity, and (5) big data divides. A more concise mapping by Ganiat and Olusola (2015) adapts Beauchamp and Childress’ (2001) influential four principles of bioethics for use in scenarios which specifically address data interoperability: (1) autonomy, (2) beneficence, (3) nonmalfeasance, and (4) justice.

Both of these frameworks can be considered in light of a central tension for ethical reflection in any form. This is the tension between individual-oriented frameworks which look to protect individual rights like privacy and informed consent, and structure-oriented frameworks which seek justice, fairness, or equality (or all three) in data-driven initiatives. We propose a mapping of the ethical landscape of data interoperability according to this tension.

We begin with frameworks oriented around individuals and the basic rights owed to them—remaining agnostic for the sake of presentation about the justificatory frameworks within which such rights can be derived.⁸ We then turn to frameworks which are more structural in their focus on how social structures differentially impact persons and populations—for the purposes of a mapping we also here remain agnostic about how structurally focused values are justified and even how social structure is conceptualized.⁹

Individual rights: Privacy and informed consent

In relation to individual rights, an increase in data interoperability may be associated with increased risks to privacy and greater loss of individual control over personal data. One major privacy concern is that of deanonymization due to recombination of previously separated DEs. For example, Sula argues that increasing interoperability threatens to deanonymize individuals since it renders old data more readily available to novel and unforeseen methods of data analysis and extraction (Sula, 2016: 19). One current strategy for deanonymization is through “data triangulation” methods where an anonymized dataset may be algorithmically combined with outside information to produce the necessary variables for inferring a data subject's identity (World Health Organization, 2021: 41). While deanonymization is often not purposeful, it is not difficult to foresee how accidental cases may occur through increasing data exchange and the implementation of machine learning (or artificial intelligence) approaches, rendering it easier over time to deidentify patients, clients, or research subjects.

This issue is perhaps most pronounced in healthcare-related -omics research, where anonymous patient records may be combined with external data to reidentify patients (sometimes in violation of the HIPAA Privacy Rule). In some cases, data researchers have learned that certain variables are so rare as to invalidate any form of anonymization. Layman documents how diseases including cystic fibrosis, Friedreich ataxia, hereditary hemorrhagic telangiectasia, Huntington disease, phenylketonuria, Refsum disease, sickle cell anemia, and tuberous sclerosis were infrequent enough at particular hospital locations that combining genomic data with discharge records sufficed for deanonymizing patients with these diseases in 32.9% to 100.0% of cases (Layman, 2008: 156). Other methods such as using surname elements within genomics records in combination with basic demographic data in other records can also reidentify subjects or even link genetic data from one individual to their relatives (Gymrek et al., 2013).

This issue was already documented in the bioinformatics literature 20 years ago by Malin and Sweeney, who present the following case:

John Smith is admitted to a local hospital, where he is diagnosed, via a DNA diagnostic test, with a DNA-influenced disease, such as cystic fibrosis. The hospital stores the clinical and DNA information in John's electronic medical record. For treatment, John visits several other hospitals, where his electronic medical record is also collected and stored. For research purposes, the hospitals forward certain DNA databases, including John's DNA, onto a research group. The DNA records are tagged with the submitting institution and with pseudonyms for their submitted sequences. By state law, the hospital sends a copy of the identified discharge record, including name, gender, zip code, visit date diagnoses, and procedures, onto a state-controlled database. The discharge database is made publicly available in a deidentified format and can be reidentified to publicly available records, such as voter registration databases. This final step of linking is based on the uniqueness of demographics, which has been validated in previous data privacy research, as well as in demography, public health, and epidemiology communities. (Malin and Sweeney, 2004: 181)

This is an account, all too common, of a trail of identifying data which can easily be used to break anonymization procedures. Despite anonymization techniques, interoperable data formats embed technical, semantic, and organizational constraints whose implications can help enable reidentification.

These cases demonstrate the risks associated with the development of interoperable healthcare systems, specifically at the level of organizational interoperability as outlined above. Notably, the increased potential for unifying and consolidating localized forms of data may come at the cost of generating datasets which openly identify their subjects or which contain all of the separate elements required for deanonymization. However, it is up for debate whether interoperable systems do or do not increase privacy risks overall. Within healthcare settings, it seems that consolidated digital information stored within EHRs might be more readily accessible to bad actors and would contain more information than any individual system (Layman, 2008). On the other hand, it may be easier to secure one single system for EHRs rather than rely upon a patchwork of separate data systems which are scattered across a range of healthcare providers (O’Reilly-Shah et al., 2020: 342).

Key problems to consider here concern how access to consolidated data systems should be conducted and how data subjects can trust that their data will not be misused by professionals with credentialed access. Here, informed consent is a major issue, since interoperability may greatly expand the secondary uses of any particular data point beyond what is currently conceivable or link data in unexpected ways (see Hand, 2018; Mittelstadt and Floridi, 2016). In healthcare settings, a major question is whether interoperable EHRs will allow providers not associated with a patient to nonetheless have full access to their health records. In relation to this problem, patient authorization has been proposed as a means to secure data autonomy under conditions of expanded EHR interoperability (Ganiat and Olusola, 2015: 14). However, a call for authorization procedures may be too strict in relation to secondary data use outside of standard healthcare practice, especially when health data has undergone deidentification for research purposes.

Social structure: Justice, fairness, and equality

The potential overestimation of the risks of rights violations of individuals may not only stifle research projects which require secondary data use, but they may themselves be a site of ethical or normative concern, especially in relation to structural matters of justice, fairness, and equality. In outlining their application of principlism to healthcare interoperability, Ganiat and Olusola define justice as the principle upheld when “interoperating electronic healthcare systems are used to provide equal and prompt healthcare care to everyone as well as also ensuring data availability, accuracy, and security” (Ganiat and Olusola, 2015: 15). They additionally note the essential role interoperability plays in reducing a “digital divide” between healthcare systems, a problem which Mittelstadt and Floridi (2016) further extend into the concept of a “Big Data divide.” This divide involves unequal distributions of benefits and burdens flowing from big data analyses. All of these concerns point to the need for critically questioning who is being represented in data, how they are being represented, how fairly their data representations are being subjected to treatment, and what purposes lay behind each of these operations (Crawford et al., 2014).

Data interoperability fundamentally attempts to reduce what has been termed lossiness, where “data collection and/or analysis may involve aggregation, case construction, or standardization in such a way that certain aspects of the phenomena of interest are lost” (Busch, 2014: 1732). In privacy-oriented accounts, this lossiness is analyzed as an unfortunate but ethically neutral state of affairs. For instance, Sula has argued that with greater longevity of data life, “the potentials for data loss, theft and unintended consequences are high—but entirely mitigated when no personally identifiable information is collected in the first place” (Sula, 2016: 20). By contrast, in relation to structural frameworks addressing data divides, data loss is not just a technical problem derivative of privacy concerns.

By taking seriously an emphasis on structural questions of ethics, we can begin to understand how data loss is intricately connected to discrepancies in data recording, biases in formatting, and structural barriers to technical interoperability which result in unequal distributions of data loss among marginalized communities. These issues were on full display during the COVID-19 pandemic, where data interoperability issues caused by hyperfragmented datasets and lack of reporting standards prevented secondary data use in disease tracking (Backhaus, 2020). Describing these problems, Naudé and Vinuesa employed the term data deprivation (a concept originally used to map issues in tracking poverty in developing countries) to describe how the U.S. pandemic response failed to track the virus among lower socioeconomic and racially marginalized populations who had higher disease incidence rates (Naudé and Vinuesa, 2021: 5). Similarly, Pelizza (2020) blames many of the barriers to COVID-19 tracking on testing procedures which did not take into account patients who lacked health insurance and stable residency. While this may not initially seem like an issue of interoperability, digital divides in EHR have been demonstrated to exist for patients from vulnerable populations solely on the basis of their lack of a single primary care provider and their greater likelihood of visiting disparate health systems (Giafrancesco et al., 2018). In other words, if interoperability just is the ability to link data from multiple systems, greater data loss is likely to occur for populations whose data are spread among multiple systems which lack interoperability.

Due to these and other problems, several scholars have identified the United States' failures during the COVID-19 pandemic as a wakeup call for current interoperability limitations in healthcare (Greene et al., 2021; Naudé and Vinuesa, 2021). In a way that highlights what we referred to above as the central tension between individual-oriented and structure-oriented frameworks, Piller (2020) frames the U.S. COVID-19 response as ultimately being overly cautious about reidentification harms within data shared between public authorities and epidemiologists. This cautious approach is antithetical to ethical aims when standard privacy-oriented accounts are supplemented with frameworks which seek to ensure a more just, fair, or equal distribution of the benefits of increased social welfare. Though privacy may be further compromised within an interoperable healthcare data system, it is also necessary to emphasize how broader concerns for social welfare are easily neglected in favor of individual rights-based approaches to the detriment of vulnerable populations. Therefore, while we should not downplay the harms that can be directed against individuals, an overly restrictive individual-centered framework may inadvertently generate social-structural harms. In other words, the tension that manifest between individual-oriented and structure-oriented frameworks are often difficult to resolve. Above all, we should not pretend they are resolved by focusing all of ethical analysis on one side or the other of the ledger.

A pragmatist approach to achieving data interoperability

There are several strategies for minimizing the harms of data interoperability. These include technical approaches leveraging big data analytics, ontological approaches which emphasize standardized vocabularies for coding relevant information, and policy-oriented approaches which require state and/or market stakeholders to contribute to better training of coders and more standardized collection of data.

On the technical front, big data analytics and machine learning have been suggested as tools which could be used to parse through information and derive correlations between DEs in a manner that exceeds human capacities. However, scholarship in this area points to issues such as opacity in deep learning models and introduction of additional bias in the training and unequal deployment of algorithms (Gianfrancesco et al., 2018). Alternatively, a longstanding proposed solution to data interoperability has been the establishment of standardized codes, vocabularies, or ontologies within defined academic, clinical, or research domains (Dixon et al., 2014; Kuiler and McNeely, 2018). Here, even those who endorse such approaches recognize the practical limitations which impede the universal adoption of ontological standards. For one, in healthcare settings, economic costs are likely to fall upon healthcare providers who will need additional time to adequately train personnel (Dixon et al., 2014). The coding and cleaning of data in healthcare settings could also be performed by the public health officials who gather data for secondary use. However, such models presume adequate public funding and do not address interoperability between clinical care providers themselves (Dixon et al., 2014). Additionally, such a solution is unlikely to resolve issues which primarily occur at the time of initial data entry, such as missing DEs, misidentification and mislabeling, or DEs too broad for secondary use (e.g. a racial or gender category labeled as “other”).

In acknowledging the limitations of both market-driven and publicly funded proposals, a third “strategic, cooperative approach” introduced by Dixon, Vreeman, and Grannis advocates for a set of shared practices which distribute the costs of interoperability across all relevant stakeholders.

[P]ublic health would collaboratively develop a strategic plan with data sharing partners whereby all stakeholders that generate and report clinical data would partner to improve semantic interoperability. The onus of translation would not fall disproportionately to any one group, making it equitable. Instead each stakeholder group would invest time and resources into the process of translation to enable full semantic interoperability across the myriad health IT systems and scenarios for public health reporting. So while implementation might be somewhat more complex in this scenario, it is likely to be more acceptable to all stakeholders and incur the lowest cost. (Dixon et al., 2014: 6)

The strategic-cooperative approach offers a compromise between various parties who collaborate to enact meaningful change to current standards of interoperability at the technical, semantic, and organizational levels.

We recognize a pragmatist thread implicit in the strategic cooperative approach, namely that interoperability is embedded within complex sociotechnical environments where the ideal theoretical conditions presumed by some (but not all) forms of individual-centered and structure-centered ethical principles may be difficult to implement in practice. This pragmatic impulse extends to other previously cited domains within the scholarship, including the limited and contextually defined scope of semantic negotiation and the “qualculative” elements at play when data entry occurs on-the-ground (Pine, 2019).

Given the time constraints and the limited economic resources of stakeholders, one could ask how an ethical solution to interoperability concerns could be achievable within our current system at all. In response to such skepticism, and in an effort to repel the cynicism that skepticism always invites, we conclude by analyzing how ethical data interoperability could be justifiably based upon a pragmatist version of the strategic-cooperative approach. To this end, we outline three ways in which philosophical pragmatism can speak to the situation-dependent and fallibilistic procedures involved within the selection and matching of DEs as well as the complex sociotechnical conditions in which effective data collection and use must occur.

Three pragmatist strategies: Data standards, manual curation, and data documentation

Originally formulated by Peirce (1878) as a maxim through which theoretical disputes could be settled by analyzing the consequences they engender, philosophical pragmatism is informed by a range of philosophers including Peirce, James (1907), and Dewey (1938) as well as more recent analytic pragmatisms developed by Rorty (1991), Brandom (2008), and Anderson (2020). In all of its versions, pragmatism focuses on the socially-embedded and practice-centered nature of epistemological and ethical problems. A central feature of pragmatism's concern with social practice, as described by Rorty (1991), is the search for toeholds rather than skyhooks. Pragmatism seeks to ground theoretical commitments in our contingent social practices rather than searching for immutable foundations for our epistemological and ethical projects.

In relation to the ethics of interoperability, a pragmatist approach cautions against seeking overarching solutions for all contexts of data use. It instead aims to consider what social practices could be put in place to render current data systems and data formats into forms that ameliorate present problems and mitigate actual harms. The pragmatist's aim is not to predict all future data uses. Rather, the pragmatist seeks to address how, in our data practices, we can and should be cognizant of fundamental epistemic limitations and responsive to current sociotechnical conditions. Pragmatism avoids hypothetical prediction in favor of concrete curation.

One entry point into the pragmatist approach is through the work of Leonelli (2016),¹⁰ whose study of data-centric biology employs Dewey's pragmatist theory of inquiry. In tracing data through different sites of use, Leonelli discards the familiar term “context” to describe different data problematics in favor of Dewey's term “situation” (see Dewey, 1938). For Leonelli, philosophical concepts of “context” tend to ignore the role played by nontheoretical considerations in scientific inquiry in a way that obscures the messiness of our data practices; by contrast, a “situation” refers to the total field of inquiry in terms that are inclusive of material, institutional, and social elements in addition to the conceptual or theoretical terms that are the focus of classical philosophy of science. According to Leonelli, a situational attentiveness also acknowledges the presentation and curation of data within and between domains in addition to the constantly shifting aims of researchers and social systems in any individual situation. Leonelli specifically links these features to interoperability concerns, noting that the ability to manipulate and present data within data systems is an important feature for continuing the “life” of the data (Leonelli, 2016: 183–184).

Leonelli's situational pragmatism connects to additional pragmatist concerns based on Wittgenstein's considerations on rule-following, in particular, his idea that no rule can contain a rule for its own application (Wittgenstein, 1953: §201). In Dewey's terms, since a problem results from a recognition of disordered elements within a situation and is resolved (if at all) by particular actions conducted within this situation, it is impossible to derive on the basis of one problematic situation the correct procedures one must follow in all possible situations (Dewey, 1938: 107–110). Rather, pragmatism emphasizes the value of processes of inquiry in contrast to the finalized products of prior inquiry, and as such resonates with Edwards's focus on “metadata processes” as the spontaneous and informal communication that occurs among data practitioners in situations where the products of metadata are otherwise imprecise (Edwards et al., 2011: 684). In looking at our human ability to converse with one another, offer novel inferences, and consider alternative possibilities, these processes demarcate human practices as “simultaneously focused and flexible (unlike that of computer programs, whose performance typically degrades precipitously or fails altogether in the presence of unanticipated contingencies)” (Edwards et al., 2011: 685).

In highlighting the situated flexibility of human action, pragmatism offers an important framing for addressing the ethical challenges of data interoperability in light of the real epistemic and ethical limitations on interoperability as implemented in actual situations. More specifically, pragmatism provides a framework for practicable strategies at the level of semantic interoperability that can help achieve more ethical data interoperability by way of improving data and metadata quality. We present three such strategies: data standardization, manual data curation, and data documentation. We envision these strategies as most likely to minimize or mitigate unexpected ethical harms of data interoperability when implemented in coordination with one another.

First, it is crucial to acknowledge the importance of standard taxonomies or ontologies and to do so in a way that is fully mindful of the limitations of any given standardization. Any implementation of a standard taxonomy must acknowledge that the order it institutes is typically both domain-specific and a result of features (both known and unrecognized) of locally present situations which are liable to subsequent change.¹¹ Since standards always originate in local use situations, they can easily undertheorize problems related to secondary use. They may also be too focused upon the theoretical or conceptual components of situation-dependent practices and thus fail to take into account underlying infrastructures, data management practices, and institutional settings in which these practices are located. These and other problems can lead to standard ethical harms as recognized by both the individual and structural orientations discussed in the previous section.

That data standards are both situationally limited and potentially harmful does not, however, mean that we should abandon them altogether. Data standards are both technically and socially (i.e. sociotechnically) necessary for domain-specific and cross-domain data exchanges and integrations. It may be objected that artificial intelligence applications render standard taxonomies otiose because of their ability to process massive quantities of data often referred to as “unstructured” (Jercich, 2022). But this is a misguided position, at least in sociotechnical situations implementing data interoperability. All data are structured to some degree by some amount of formatting—if they were not, they would not be machine-readable (nor human-readable). The question is always which formats are in place such that data can be well-formed, not whether there should be any formatting at all for data. Some degree of formatting is necessary for any collection, storage, or processing of data. A standardized taxonomy, then, can be defined as just a data format that applies across two or more datasets. Some standards, of course, serve as domain-wide specifications because of social or institutional rules. But even with domain-wide implementation, at a technical level a standard just is that which enables interoperability between two or more differently formatted datasets. Standards are thus, in a way, necessary for responsible data interoperability. But because of the limitations noted above, they are also typically insufficient. Standards thus need to be implemented in a manner that respects their limitations. This raises a crucial question: what can be added to or implemented alongside standardization efforts in order to render data interoperability more ethical? This brings us to our next two pragmatist strategies. These strategies are focused on improvements in data quality at the level of primary data and of metadata.

A second pragmatist strategy we advocate is that of manual data curation. One of the most significant barriers to ethical data interoperability is that of low-quality, uncleaned, or garbage data. This is particularly pertinent for ethical concerns rooted in the structural orientation noted in the previous section: that is, concerns over injustice, unfairness, or inequality. Datasets that suffer from poor quality and that feature personal data tend to distribute lower-quality data unevenly—a single health records dataset, for instance, might be high-quality for patients of higher socioeconomic status but lower-quality for lower-SES patients. Given the example of disparities in mortality between population groups during the COVID-19 pandemic, especially racial disparities (Naudé and Vinuesa, 2021), data ethicists should seek to ensure that high-quality data are maintained to avoid reconstituting structural inequalities in attempts to utilize data to respond to health crises and similar situations. Even if low-quality data are somehow magically evenly distributed across different segments of a population, the tendency of any optimization procedure leveraging those data will probabilistically burden minority populations who will be represented by fewer data points (since by definition there will be less data about minority populations in a representative dataset). Finally, even if the ethical effects of a functional data system that relies upon low-quality data do not generate structural harms, they may still generate individual-type harms if, for example, a dataset contains inaccurate or misleading information about you.

Data curation practices that design in manual data correction and cleaning can help minimize or mitigate such unethical consequences. In some ways, this is an analytical point. If we know that a data system has generated ethical harms on the basis of inaccurate data, then this implies that someone somewhere along the line has located the inaccurate data that formed the basis for the harm. When this does occur, of course, it is typically the result of a post hoc audit proving disparate impact (or some other harm) which sends data researchers or systems designers scrambling to figure out what went wrong, often conducting searches that are at least partly manual (Liu et al., 2022; Sandvig et al., 2014). This implies that the low-quality data at least theoretically could have been manually discovered (and manually cured) prior to the audit. For the same reason that a strategy of manual data curation is always possible, it may also turn out to be impractical.¹² Manual curation may impose too many inefficiencies on a system to make it worthwhile, for instance. That said, if the ethical harms flowing from the implementation of a low-quality dataset overrule whatever justification we have for that implementation in terms of its social benefits, then we may need to seriously ask the question of whether such a system should be implemented in the first place.

A third pragmatist strategy for implementing practices more likely to produce higher-quality data and therefore less likely to generate ethical harms involves implementing meticulous data documentation practices which can help render semantic agreements made in one situation explicit to data practitioners in distant situations. Model proposals for data documentation like the “datasheets for datasets” (Gebru et al., 2021) and “dataset nutrition labels” (Holland et al., 2020) approaches may help in this regard by requiring that datasets be equipped with additional material specifying the motivation behind the dataset's creation, the manner in which data were collected, the types of processing applied to the data, the intended uses of the data, the distribution of the dataset to third parties, and information about the maintenance of the dataset. Since this material includes the tracking of confidential information, possible loci of deanonymization, and specifications related to the type of informed consent provided by data subjects, it sufficiently tracks autonomy-related concerns which could arise from secondary use. Information on the collection and curation of data may also reduce the likelihood of misuse or accidental introduction of bias in formatting or mismatching.

In looking back to the COVID-19 pandemic, we can see how these strategies may have mitigated semantic issues in the recording of fatalities from the pandemic. For instance, Backhaus (2020) describes how several terms were used interchangeably in the reporting of deaths. While “case fatality rate” is supposed to refer to the number of deaths per number of reported infections, “infection fatality rate” is supposed to supplement the number of reported infections with estimated unreported infections in its calculation, and “mortality rate” is supposed to refer to deaths divided by total population, these standard taxonomic categories were not always strictly followed in reporting (Backhaus, 2020: 162–163). However, even with clear taxonomies, Backhaus shows that different metrics were used to determine whether a patient died from COVID-19 or a comorbid condition (Backhaus, 2020: 164). In such cases, meticulous documentation could make explicit the procedures used to delineate deaths from COVID-19 versus those caused by a serious comorbidity and manual curation could help produce high quality data to track epidemiological spread among certain more localized population groups.

The three strategies we have presented are not perfect solutions for data interoperability harms. The authors of the “datasheets for datasets” model, for instance, explicitly note that they cannot account for dataset creators’ limited capacity to imagine alternative uses nor can they provide an effective financial model incentivizing their approach (Gebru et al., 2021: 92). However, in relation to the first worry, our pragmatist approach responds by noting that fundamental epistemic limitations will arise for any and all proposals. Thus, by accepting that no proposal can derive perfect rules for future applications, it can be affirmed that manual coding and meticulous documenting are some of the most effective proposed solutions available. This is because they ensure higher quality data for peripheral DEs and eliminate the guesswork of data practitioners in secondary use situations while also acknowledging the epistemic limitations of data practitioners in original or primary data situations.¹³ Additionally, the second worry can be deflated when we look to previously proposed solutions at the organizational-level of interoperability. The strategic-cooperative approach, for instance, could provide sufficient funding and incentives by distributing accountability across market and governmental stakeholders in a manner which, in turn, effectively distributes the benefits of interoperable data systems.

A fully pragmatist approach to data interoperability should make use of both semantic-level and organizational-level proposals while tracking the epistemic and ethical limitations necessarily imposed upon social actors at both levels. As noted earlier, at the semantic level of interoperability, ontological standards within specified domains in combination with manual data curation and meticulous documentation practices can help meliorate data harms by making initial semantic agreements explicit to practitioners in primary, secondary, and tertiary use situations. At the organizational level of interoperability, data practices need to be situated within their socioeconomic conditions by accounting for the economic and social limitations that market, civic, and governmental stakeholders face.

Across the multiple levels of interoperability, a strategic-cooperative approach informed by pragmatism and experimentally committed to implementing and balancing multiple strategies for implementing ethical data interoperability offers a way of realizing the benefits of data technologies while remaining attentive to the enormous potential of data harms. These harms are incentivized by a tendency to seek solutions to real problems by implementing data without sufficient reflection on what those data exclude and what they fail to include. Such tendencies are now augmented by implementations of artificial intelligence that seek to automate out reflective consideration. What we need to confront the ethical harms of data-driven solutions is not less reflexiveness and more automation but more reflective intelligence. It is precisely this kind of intelligent reflection upon our social practices which pragmatism seeks to cultivate.

Conclusion

The specific ethical challenges involved in data interoperability remain concerningly undertheorized in existing data ethics scholarship. The way that data are and must be constituted by formats gives rise to unique ethical (as well as epistemic) challenges where data are exchanged or integrated. These challenges flow from the variability of disparate formats. Additionally, these challenges are technically, semantically, and organizationally irreducible to other much-discussed issues in data ethics scholarship concerning the potentially-harmful effects of algorithmic processing. In the context of interoperability, it is almost always the formats of data that lead to degraded accuracy, concomitant inequality, and other epistemic and ethical problems. Pragmatist strategies for navigating data interoperability can help mitigate these individual-level and structural-level problems flowing from data interoperability. Yet pragmatist strategies are no surefire solution. The approach we advocate acknowledges the limitations faced by data practitioners by emphasizing the reflective intelligence of agents in virtue of which they can be capable of adapting to novel situations. Although our focus has primarily been on healthcare and biomedical data, the pragmatist approach to structural issues of interoperability ethics we have proposed is one that can be reflectively generalized to other domains in light of the generality of the pragmatist strategies we have outlined. A pragmatist approach cannot guarantee ethical data interoperability, but it can provide valuable reflexive support for imperfect human actors adapting themselves to new data-saturated situations. Pragmatism provides no algorithmic guarantees, and yet it nonetheless points the way toward improved data practices.

Footnotes

Acknowledgements

The authors thank several colleagues for their input and feedback on this project: Steven D. Bedrick (of Oregon Health Sciences University), Carlos Montemayor (of the Department of Philosophy at San Francisco State University), and Thomas A. Thornhill IV (of Yale School of Public Health). The authors also thank the editors and two anonymous reviewers for extensive comments.

ORCID iDs

Asher Brandon Caplan

Colin Koopman

Funding

Caplan's and Koopman's contributions were funded in part by a University of Oregon Data Science Initiative Seed Funding Convening Award. Koopman's contributions were additionally supported by an Individual Research Fellowship from the United States National Endowment for the Humanities (NEH). Funding for open-access publishing was provided by the Oregon Humanities Center at the University of Oregon and the University of Oregon Libraries Open Access Article Processing Charge Award Fund.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Aden

(2020) Interoperability between EU policing and migration databases: Risks for privacy. European Public Law 26(1): 93–108.

Allen

Karanasios

Norman

(2014) Information sharing and interoperability: The case of major incident management. European Journal of Information Systems 23(4): 418–432.

Anderson

(2020) How to be a pragmatist. In: Chang

Sylvan

(eds) The Routledge Handbook of Practical Reason. Oxford: Taylor & Francis Group, 132–150.

Backhaus

(2020) Common pitfalls in the interpretation of COVID-19 data and statistics. Intereconomics 55(3): 162–166.

Barbosa

Chen

(2019) Rehumanized crowdsourcing: A labeling framework addressing bias and ethics in machine learning. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. New York: Association for Computing Machinery, 1–12.

Beauchamp

Childress

(2001) Principles of Biomedical Ethics. Oxford: Oxford University Press.

Bellanova

Glouftsios

(2022) Formatting European security integration through database interoperability. European Security 31(3): 454–474.

Berryman

Yost

Dunn

, et al. (2013) Data interoperability and information security in healthcare. In: Transactions of the International Conference on Health Information Technology. Vol. 26, 84–93.

Bonde

Bossen

Danholt

(2019) Data-work and friction: Investigating the practices of repurposing healthcare data. Health Informatics Journal 25(3): 558–566.

10.

Bradwell

Wooldridge

Amor

, et al. (2022) Harmonizing units and values of quantitative data elements in a very large nationally pooled electronic health record (EHR) dataset. Journal of the American Medical Informatics Association 29(7): 1172–1182.

11.

Brandom

(2008) Between Saying and Doing: Towards an Analytic Pragmatism. Oxford: Oxford University Press.

12.

Brazhnik

Jones

(2007) Anatomy of data integration. Journal of Biomedical Informatics 40(3): 252–269.

13.

Busch

(2014) A dozen ways to get lost in translation: Inherent challenges in large scale data sets. International Journal of Communication 8: 1727–1744.

14.

Chute

(2005) Medical concept representation. In: Chen

Fuller

Friedman

, et al. (eds) Medical Informatics: Knowledge Management and Data Mining in Biomedicine. Boston: Springer, 163–182.

15.

Crawford

Miltner

Gray

(2014) Critiquing big data: Politics, ethics, epistemology. International Journal of Communication 8: 1663–1672.

16.

Daniel

(2017) Contemporary research discourse and issues on big data in higher education. Educational Technology 57(1): 18–22.

17.

De Hert

Papakonstantinou

Malgieri

, et al. (2018) The right to data portability in the GDPR: Towards user-centric interoperability of digital services. Computer Law & Security Review 34(2): 193–203.

18.

Dewey

(1938) Logic: The Theory of Inquiry. New York: Henry Holt and Company.

19.

Dixon

Vreeman

Grannis

(2014) The long road to semantic interoperability in support of public health: Experiences from two states. Journal of Biomedical Informatics 49: 3–8.

20.

Edwards

Mayernik

Batcheller

, et al. (2011) Science friction: Data, metadata, and collaboration. Social Studies of Science 41(5): 667–690.

21.

Fazelpour

Danks

(2021) Algorithmic bias: Senses, sources, solutions. Philosophy Compass 16(8): e12760.

22.

Ganiat

Olusola

(2015) Ethical issues in interoperability of electronic healthcare systems. Communications on Applied Electronics 1(8): 12–18.

23.

Gebru

Morgenstern

Vecchione

, et al. (2021) Datasheets for datasets. Communications of the AMC 64(12): 86–92.

24.

Gianfrancesco

Tamang

Yazdany

, et al. (2018) Potential biases in machine learning algorithms using electronic health record data. JAMA Internal Medicine 178(11): 1544–1547.

25.

Graef

Husovec

Purtova

(2018) Data portability and data control: Lessons for an emerging concept in EU law. German Law Journal 19(6): 1359–1398.

26.

Greene

McClintock

Durant

TJS

(2021) Interoperability: COVID-19 as an impetus for change. Clinical Chemistry 67(4): 592–595.

27.

Gymrek

McGuire

Golan

, et al. (2013) Identifying personal genomes by surname inference. Science 339(6117): 321–324.

28.

Hand

(2018) Aspects of data ethics in a changing world: Where are we now? Big Data 6(3): 176–190.

29.

Heiler

(1995) Semantic interoperability. ACM Computing Surveys 27(2): 271–273.

30.

Hellberg

A-S

Grönlund

(2013) Conflicts in implementing interoperability: Re-operationalizing basic values. Government Information Quarterly 30(2): 154–162.

31.

Hodapp

Hanelt

(2022) Interoperability in the era of digital innovation: An information systems research agenda. Journal of Information Technology 37(4): 407–427.

32.

Hoffman

Podgurski

(2013) Big bad data: Law, public health, and biomedical databases. Journal of Law, Medicine & Ethics 41(S1): 56–60.

33.

Holland

Hosny

Newman

, et al. (2020) The dataset nutrition label: A framework to drive higher data quality standards. In: Hallinan

Leenes

Gutwirth

, et al. (eds) Data Protection and Privacy, Volume 12: Data Protection and Democracy. New York: Bloomsbury, 1–26.

34.

James

(1907) Pragmatism: A New Name for Some Old Ways of Thinking. New York: Longmans, Green, and Co.

35.

Jercich

(2022) AI can play a key role in turning unstructured data into actionable insights. Healthcare IT News, 14 March. Available at: https://www.healthcareitnews.com/news/ai-can-play-key-role-turning-unstructured-data-actionable-insights (accessed 10 June 2024).

36.

Kearns

Roth

(2019) The Ethical Algorithm: The Science of Socially Aware Algorithm Design. Oxford: Oxford University Press.

37.

Kerber

Schweitzer

(2017) Interoperability in the digital economy. Journal of Intellectual Property, Information Technology and Electronic Commerce Law 8(1): 39–58.

38.

Koopman

(2022) The political theory of data: Institutions, algorithms, and formats in racial redlining. Political Theory 50(2): 337–361.

39.

Koopman

(2025) Data Equals: Democratic Equality and Technological Hierarchy. Chicago: University of Chicago Press.

40.

Koopman

Jones

Simon

, et al. (2022) When data drive health: An archaeology of medical records technology. BioSocieties 17(4): 782–804.

41.

Kuiler

McNeely

(2018) Federal big data analytics in the health domain: An ontological approach to data interoperability. In: Batarseh

Yang

(eds) Federal Data Science. London: Elsevier, 161–176.

42.

Layman

(2008) Ethical issues and the electronic health record. The Health Care Manager 39(4): 150–162.

43.

Leese

(2022) Fixing state vision: Interoperability, biometrics, and identity management in the EU. Geopolitics 27(1): 113–133.

44.

Leonelli

(2016) Data-Centric Biology: A Philosophical Study. Chicago: University of Chicago Press.

45.

Liu

Glocker

McCradden

, et al. (2022) The medical algorithmic audit. The Lancet Digital Health 4(5): e384–e397.

46.

Malin

Sweeney

(2004) How (not) to protect genomic data privacy in a distributed network: Using trail re-identification to evaluate and design anonymity protection systems. Journal of Biomedical Informatics 37(3): 179–192.

47.

Miceli

Posada

(2022) The data-production dispositif. Proceedings of the ACM on Human-Computer Interaction 6(CSCW2): 1–37.

48.

Mittelstadt

Floridi

(2016) The ethics of big data: Current and foreseeable issues in biomedical contexts. Science and Engineering Ethics 22(2): 303–341.

49.

Naudé

Vinuesa

(2021) Data deprivations, data gaps and digital divides: Lessons from the COVID-19 pandemic. Big Data & Society 8(2): 1–12.

50.

Ward

, et al. (2017) Data on race, ethnicity, and language largely incomplete for managed care plan members. Health Affairs 36(3): 548–552.

51.

Nicholson

Perego

(2020) Interoperability of population-based patient registries. Journal of Biomedical Informatics 112S(100074): 1–14.

52.

Nozick

(1974) Anarchy, State, and Utopia. New York: Basic Books.

53.

O’Reilly-Shah

Gentry

Van Cleve

, et al. (2020) The COVID-19 pandemic highlights shortcomings in US health care informatics infrastructure: A call to action. Anesthesia and Analgesia 131(2): 340–344.

54.

Pagano

Candela

Castelli

(2013) Data interoperability. Data Science Journal 12(GRDI): 19–25.

55.

Peirce

(1878) How to make our ideas clear. Popular Science Monthly 12(January): 286–302.

56.

Pelizza

(2016) Developing the vectorial glance: Infrastructural inversion for the new agenda on government information systems. Science, Technology, & Human Values 41(2): 298–321.

57.

Pelizza

(2020) No disease for the others: How COVID-19 data can enact new and old alterities. Big Data & Society 7(2): 1–7.

58.

Piller

(2020) Data secrecy is crippling attempts to slow COVID-19’s spread in U.S., epidemiologists warn. Science, 16 July. DOI: https://doi.org/10.1126/science.abd8599.

59.

Pine

(2019) The qualculative dimension of healthcare data interoperability. Health Informatics Journal 25(3): 536–548.

60.

Rawls

(1971) A Theory of Justice. Cambridge: Harvard University Press.

61.

Rorty

(1991) Objectivity, Relativism, and Truth. Cambridge: Cambridge University Press.

62.

Sandvig

Hamilton

Karahalios

, et al. (2014) Auditing algorithms: Research methods for detecting discrimination on internet platforms. In: Data and Discrimination: Converting Critical Concerns into Productive Inquiry, Seattle, WA, 2014.

63.

Santos

Kravcik

Boticario

(2016) Preface to special issue on user modelling to support personalization in enhanced educational settings. International Journal of Artificial Intelligence in Education 26(3): 809–820.

64.

Shrivastava

Song

Han

, et al. (2021) Do data security measures, privacy regulations, and communication standards impact the interoperability of patient health information? A cross-country investigation. International Journal of Medical Informatics 148(104401): 1–11.

65.

Subbian

Solomonides

Clarkson

, et al. (2021) Ethics and informatics in the age of COVID-19: Challenges and recommendations for public health organization and public policy. Journal of the American Medical Informatics Association 28(1): 184–189.

66.

Sula

(2016) Research ethics in an age of big data. Bulletin of the Association for Information Science and Technology 42(2): 17–21.

67.

Torab-Miandoab

Samad-Soltani

Jodati

, et al. (2023) Interoperability of heterogeneous health information systems: A systematic literature review. BMC Medical Informatics and Decision Making 23(18): 1–13.

68.

Wang

(2021) Precision omics data integration and analysis with interoperable ontologies and their application for COVID-19 research. Briefings in Functional Genomics 20(4): 235–248.

69.

Watson

Mökander

Floridi

(2024) Competing narratives in AI ethics: A defense of sociotechnical pragmatism. AI & Society 40(5): 3163–3185.

70.

Wittgenstein

(1953) Philosophical Investigations. 1st ed. Oxford: Basil Blackwell.

71.

Wong

Cheung

SKS

(2023) An analysis of learning analytics in personalised learning. Journal of Computing in Higher Education 35(3): 371–390.

72.

World Health Organization (2021) Ethics and Governance of Artificial Intelligence for Health: WHO Guidance. Geneva: World Health Organization. Available at: https://apps.who.int/iris/handle/10665/341996 (accessed 29 June 2022).

The ethics of data interoperability: Mapping problems and strategies in biomedical data and beyond

Abstract

Keywords

Introduction

An overview of data interoperability

Problems in semantic interoperability

Mapping the ethical landscape of data interoperability: Individual-oriented versus structure-oriented frameworks

Individual rights: Privacy and informed consent

Social structure: Justice, fairness, and equality

A pragmatist approach to achieving data interoperability

Three pragmatist strategies: Data standards, manual curation, and data documentation

Conclusion

Footnotes

Acknowledgements

ORCID iDs

Funding

Declaration of conflicting interests

Notes

References