Abstract
This paper investigates the technopolitics of identification across interoperable systems, with a particular focus on the identification of third-country nationals. In doing so, it reveals the data work required to establish and strengthen interoperability across systems. In the ongoing debate concerning interoperability's potential to bridge siloed organisations and their systems, the article focuses on iterative identification processes that make use of governmental interoperable infrastructures. Central to this investigation are the data matching practices and technologies employed at various stages of bureaucratic identification processes, including residency and naturalisation, extending beyond the initial identification of individuals at the border. The article introduces the concept of ‘re-identification' to characterise iterative processes that third-country nationals must undergo at different points in space and time. The analysis draws on data collected at the government immigration agency in the Netherlands and at the agency's data matching software supplier. We analyse how data matching technologies address ‘data frictions’ usually associated with interoperability. We argue that their utilisation in curbing data friction may entail new organisational costs. The findings show that data matching necessitates additional data work for effective re-identification, including formulating search queries, interpreting ambiguous results, and reconciling records across organisations and time.
Keywords
This article is a part of special theme on Technopolitics of Interoperability. To see a full list of all articles in this special theme, please click here: https://journals.sagepub.com/page/bds/techno-politicsofinteroperability
Introduction
Across institutional settings, interoperability has emerged as a promising solution to bridge siloed organisations and their information systems. Proponents highlight its potential to create more efficient bureaucracies. This vision, for example, drives the European Union (EU)'s commitment to interoperability programmes that advance the free movement of information, goods, capital, services and people. The EU and Member States are working on interoperable solutions that enhance cross-border data sharing and services, aligning data and regulatory standards across territorial, sectoral and organisational boundaries (European Commission, 2017a; European Parliament and Council, 2024). This is especially relevant for identifying individuals, where interoperability can link personal data collected in diverse systems across time and space to create unified digital identities. In this way, the EU sees interoperability as an opportunity to enhance services and mobility for citizens, facilitate their cross-border identification, access services such as banking and insurance across territorial boundaries (European Union, 2024) but also to streamline asylum processes through data sharing for refugee status determination, and strengthen security measures by enabling access to essential information across various agencies and jurisdictions (European Commission, 2016).
Despite such policy enthusiasm and normative thrust, achieving interoperability might prove challenging. Most literature agrees that the obstacles in pursuing interoperability are heterogeneous (Bekkers, 2007). Implementing modern identification practices through interoperable data infrastructures requires addressing technical, organisational, regulatory and semantic challenges. Issues with the quality of personal data may arise from technical bugs, inconsistencies in data recording across organisations, different legal frameworks or differences in standardisation across systems. As a matter of fact, government offices often struggle with managing uncertainty in databases as they contain incomplete, outdated or incorrect data, and occasionally even duplicate entries for the same individuals (Keulen, 2012).
Such issues with data quality are often addressed through data matching. Data matching is a socio-technical practice that links records by identifying similarities and resolving inconsistencies across different datasets. As such, it is an important technique to establish and/or strengthen interoperability. For identifying individuals, data matching features a further characteristic that makes it a valuable site of investigation: it supports different stages of identification procedures, such as residency or naturalisation, by linking records and resolving data quality issues across databases. This feature becomes essential when identification is underpinned by highly complex data infrastructures integrating multiple data flows. This is the case of the identification of third-country nationals on the move to and across Europe. In this context, data quality issues emerge with the need to interconnect and harmonise diverse databases run by institutions at local, national and supranational scales (Pelizza, 2020; Pettrachin and Pelizza, 2024). Such databases entail different time frames and geographical boundaries, well-beyond the first encounter with the border. Iterative identification thus characterises the mobility of asylum seekers and migrants within Europe (Perret and Aradau, 2024). To acknowledge the specificity of these practices, we propose the concept of ‘re-identification’, which can better capture the iterative identification processes that depend on data linked from different databases.
This paper asks whether there are any invisible costs associated with data matching in re-identification to enhance interoperability and ultimately bridge data silos. What are the additional efforts to manage, verify and align identity data across systems? Famously, technology studies have introduced the concept of ‘data friction’ to account for the efforts involved in producing ‘global data’ (Edwards, 2010). Similarly, critical data studies have re-actualised the term ‘data work’ to draw attention to often invisible work (Star and Strauss, 1999), including processes such as data cleaning, standardisation and resolving inconsistencies to make data usable (Bjørnstad and Ellingsen, 2019; Muldoon et al., 2024; Pine and Bossen, 2020).
We suggest extending the application of these concepts to the context of (re-)identification of third-country nationals. We argue that data matching for re-identification does not simply minimise friction and pursue univocity but it also entails new data work across multiple interoperability dimensions, and thus extra organisational costs. Such data work reflects how efforts towards achieving interoperability are well known to extend beyond purely technical solutions to include syntactic, semantic and organisational dimensions (Kubicek et al., 2011).
The research provides an empirical study of re-identification at the Immigration and Naturalisation Service of the Netherlands (IND). The analysis draws on data gathered through fieldwork – interviews, documents and field notes – at the agency's data matching software supplier and at the IND agency itself. The agency's identity data management relies on specialised data matching software, which is crucial for linking, verifying and interpreting records by identifying similarities and resolving inconsistencies across disparate datasets. Fieldwork was conducted from July 2020 to July 2021 and could rely on access granted thanks to the collaboration between the software supplier and the ‘Processing Citizenship’ research project. 1
The empirical analysis revealed three forms of invisible data work as extra organisational costs entailed by data matching aimed at reducing data frictions in re-identification. First, differing identification standards and policies across organisations can create friction in re-identification, necessitating additional data work to address semantic issues and syntactical discrepancies. Second, the methods for calculating matches and ranking search results can generate friction, requiring data work to determine the types and quantity of information needed for effective search queries. Third, the use of historical data in matching offers improved re-identification but sometimes necessitates extra data work because of unclear match justifications and temporal data inconsistencies.
Throughout this analysis, our goal is to expand on the material and performative aspects of identification by examining the data work involved in re-identification using interoperable systems. The paper's findings can contribute to digital identification studies, technology studies and critical data studies. First, while most investigations in the emergent field of digital identification have predominantly concentrated on one-time identification, often involving biometric data, this paper's findings highlight the often-overlooked iterative processes of re-identification that require interoperable data infrastructures and span across time and space. Second, the paper broadens the concept of ‘data friction’ (Edwards, 2010) to identification across interoperable systems. Finally, by examining routine re-identification interactions embedded within specific socio-technical contexts, the findings show how incorporating data matching tools intended to curb data frictions and facilitate interoperability can introduce extra data work for other actors or entities.
The following section delves into the literature that informs our conceptualisation of data matching as a socio-technical practice to enhance interoperability, re-identification and data friction and work. Following this, an overview of the case and method adopted for examining matching systems and applicant re-identification at the Netherlands’ Immigration and Naturalisation Service (IND) is presented. The empirical sections analyse how data matching technologies address data friction in applicant re-identification and the invisible data work required for successful re-identification. These analyses highlight how technologies aimed at enhancing interoperability may shift the costs of dealing with ambiguous data to other actors or entities. The conclusions further discuss the implications of these findings for literature and policy on data frictions and interoperable identity infrastructures.
Interoperability and data matching in re-identification infrastructures: From challenges to data frictions and data work
Interoperability is commonly described as the ability of two or more organisations to exchange and utilise data. It can be underpinned by different architectures, like, for example, data sharing (i.e., data is shared upon request) or data pooling (i.e., data is shared in a common repository). Most of the time, interoperability entails some degree of integration between systems, so that data exchange is at least in part automated. The potential of interoperability to bridge siloed organisations and enhance information exchange has become a central concern for government information infrastructures, considering the need to facilitate cross-border data sharing (European Commission, 2017a; European Parliament and Council, 2024; OECD, 2011; United Nations, 2024; US Executive Office of the President, 2012). Interoperability programmes are bringing renewed resources to early e-Government attempts at streamlining bureaucracy, achieving efficiency gains and optimising public resources by automating information exchange (Pelizza, 2016). Although interoperability can be narrowly defined by systems’ ability to exchange and use data, its broader definition also includes social, political and organisational dimensions that affect how effectively different systems and organisations can work together to deliver meaningful outcomes (Bekkers, 2007). In this way, proponents argue that effective interoperability improves bureaucratic efficiency by breaking down organisational ‘silos’ and fostering better inter-agency collaboration (European Commission, 2017a; Scholl and Klischewski, 2007).
Similarly, the appeal of interoperability for identification lies in its promise to connect identification data seamlessly across different institutions and jurisdictions, leading to better data quality and improved services. Contemporary digitisation programmes are thriving and being implemented across various countries. One example is Belgium's digital identification app, developed collaboratively by major banks and mobile network operators. The app, originally developed for the private sector, was approved by the government for official identification in public services and is now expanding its reach to other European countries (Mahula et al., 2021). On the other side of the spectrum, Danish state authorities have adopted a government-led approach to making digital identity interoperable, although the banking, insurance and other sectors have had a prominent role in the design (Jensen and Heuser, 2024). At the supranational scale, the Interoperable Europe Act and eIDAS regulations see cross-border identification as a precondition for service provision Europe-wide. The ongoing development of the EU Digital Identity Wallet constitutes a major programme in this direction. Once operational, it will provide EU citizens with digital identification to access public and private services, both nationally and across the EU (European Union, 2024).
The literature indicates that achieving interoperability entails pursuing it across various layers, aspects, facets or dimensions. For instance, Scholl and Klischewski (2007) distinguish between four levels: syntactic, functional, semantic and user task. Charalabidis et al. (2011) identify four facets of interoperability: enterprise, organisational, semantic and technical. Pardo et al. (2012) propose a set of 16 multidimensional capabilities across Policy, Management and Technology for interoperability in e-government. Wimmer et al. (2018) present a model advocating cooperation on four interconnected levels: political, strategic, tactical and operative to govern interoperability. The heterogeneity of the dimensions needed to achieve interoperability reveals the challenges and stakes of designing and implementing interoperable infrastructures.
In the EU government context, the New European Interoperability Framework (EIF) stands as a well-accepted standard framework for achieving interoperability (European Commission, 2017b; Kubicek et al., 2011). The four layers of interoperability according to the EIF are: legal, organisational, semantic and technical.
The challenges interoperability faces across various dimensions are clearly shown in identification infrastructure development. Backhouse and Halperin (2009) have highlighted that achieving interoperable identification systems in Europe faces not only technical issues but also legal, regulatory and cultural differences. For example, variations in privacy law across countries and diverse public and organisational opinions on digital identities can complicate the sharing of citizen data between government systems. This corresponds to the legal dimension of the EIF. In their examination of EU public organisations, Otjacques et al. (2007) have shown how barriers to using singular identifiers (data sets that uniquely identify individuals) can impede the development of interoperable cross-border identification systems. Barriers extend beyond the technical challenges of creating new identifiers to privacy concerns and organisational complications arising from integrating legacy identification systems across countries. The lack of a universally applicable identifier for accurate individual identification across all member states constitutes an example of the EIF’ organisational and technical dimensions of interoperability. Varying data protection and identification policies across countries can further complicate the exchange of identity data. This can be considered as work on the organisational dimension, which includes aligning business processes, policies and workflows among entities involved in identification. Furthermore, the technical dimension can be reflected in the efforts to ensure that technical components and infrastructures are compatible with seamless data exchange. Issues such as incompatible communication protocols or data transmission errors can contribute to data misalignment.
On top of that, interoperability in identification across borders requires a further level of coordination among organisations and countries with diverse policies and technologies. Such coordination may involve reconciling differences between countries in data format standards, including variations in the types of data captured and the specific formats used. These can be part of the work on the syntactic and semantic interoperability dimensions, and they hold particular significance for identification processes. Syntactic interoperability pertains to ensuring that the data format is consistently structured for the information exchanged. For instance, it involves establishing standardised data formats that specify the categories of data for the identification of individuals and their corresponding values. This dimension ensures that data can be accurately interpreted and processed by different systems or entities involved in the exchange. Semantic interoperability means ensuring that data elements are understood in the same way by all parties exchanging information. This is achieved through the use of metadata, such as a data vocabulary or dictionary, which establishes a common language and understanding by defining the terms, definitions, attributes, relationships and taxonomies between them. For instance, in identification, semantic interoperability ensures that terms related to personal attributes, such as name, date of birth and nationality, are consistently interpreted under different systems or organisations involved in identification processes.
It emerges from this discussion that addressing such syntactic, semantic, technical and organisational dimensions in cross-border identification is not only prone to the data quality issues that are typical of interoperability at large. Data quality in the exchange and use of identity information can decrease even more sharply across borders, for two reasons. First, as we have seen, exchanging data across diverse national systems can encounter more syntactic, semantic, technical and organisational misalignments than among national systems. Second, in exchanging third-country-nationals data aimed at security cooperation among states (Bellanova and Glouftsios, 2022; Trauttmansdorff, 2022), highly diverse and unstandardised personal data are handled. For instance, a report from the European Court of Auditors (2020: 31) points out various data quality issues in a prominent EU information system dedicated to border control. Issues include situations where first names are mistakenly labelled as last names or when birth dates are missing, resulting in a high number of false positives that border guards must manually verify.
In this context, efforts to identify third-country nationals unambiguously across countries, despite inconsistent data formats, types and quality, often require specialised data matching solutions. Data matching addresses the inherent uncertainties found in data (Keulen, 2012). It is expected to curb the decline in data quality that usually emerges while integrating systems designed in different contexts, at different moments and for different purposes and data, by matching corresponding data and resolving duplicate data (Christen, 2012). Data matching to determine if different data records pertain to the same individual relies on a variety of underlying technical methods (Batini and Scannapieco, 2016). All methods generate matches by comparing attributes of data records, such as names, dates of birth or identification numbers, to determine whether two records refer to the same person (Christen, 2012). For example, metrics for string similarity evaluate how similar two sequences of characters are by measuring the operations required to convert one sequence into the other. In this way, the names ‘Alexander’ and ‘Sasha’ may not be considered closely related, given their dissimilar string sequences. Yet a rule-based system might classify this as a match, as ‘Sasha’ is frequently used as a Russian diminutive of ‘Alexander’.
Data matching can be seen as a socio-technical practice that links records and pursues univocity by identifying similarities and resolving inconsistencies across different datasets. As such, it is an important technique to increase data quality and ultimately establish and/or strengthen interoperability. Unlike other one-time techniques to achieve interoperability, data matching is an iterative practice. Data standards, for example, are important during the initial registration of third-country nationals, in the hope that standardisation of identity data will facilitate future identification. However, third-country nationals often go through iterative identification procedures for diverse purposes: for initial identification at the border, and accommodation, health care evaluation, relocation, naturalisation, among others. At every step, identification is required, and data matching plays a key role in linking records and resolving data quality issues in complex interoperable data infrastructures. Especially in migration management, interactions between migrants and public authorities encompass multiple moments of identification to establish or verify applicants’ identity in different steps of bureaucratic processes, such as granting asylum, issuing residency permits, naturalisation and so forth. These diverse interactions reveal that identification is not just a problem of faithful representation between a person and the information captured about them. Each interaction may be characterised as enacting a person's identity in specific ways (Lyon, 2009).
To account for the limits of the representationalist paradigm, Pelizza (2021) has proposed to understand identification as a chain of translation that makes equivalent data collected in different places at different times. This approach has the advantage of uncovering the multiple socio-technical mediators that are involved in translation. However, what is missing from such a conceptualisation of identification is how the goal of accounting for multiplicity involves not only initial identification but also iterative processes and the labour of verifying and connecting data over time and across various contexts. There is a noticeable lack of research on how practitioners’ practices of managing uncertainties emerge from ambiguities in personal identity data during iterative processes of identification. To address this gap, we introduce the concept of ‘re-identification’. 3 Re-identification is a more apt term than identification when this is enabled by interoperable infrastructures. It is intended to encompass a spectrum of iterative identification processes where data, whether sourced from within or across organisations and collected across diverse temporal and spatial contexts, are used and linked to determine whether multiple database records correspond to a single real-world individual. Instances of re-identification encompass diverse scenarios, ranging from cross-referencing an individual's passport details to access their visa records, correlating flight information to identify matches on watchlists, or linking migration and law enforcement databases to unveil potential suspect identities. As a result, re-identification also considers the lived experience of third-country nationals who are required to iteratively identify themselves in their journey not only to but also within Europe.
In this light, a set of questions arises, which are not properly addressed by data matching literature but are inspired by research in technology studies and critical data studies. Does data matching effectively bridge data silos or do invisible costs arise? How are efforts to manage, verify and align identity data across systems distributed? How is the burden of re-identification shifted? To address similar concerns, technology studies have introduced the notion of ‘data friction’. In the words of Edwards (2010), data friction refers to the efforts involved in producing ‘global data’, that is, the global data infrastructure for climate measurements. In his definition, ‘“data friction” refers to the costs in time, energy and attention required simply to collect, check, store, move, receive and access data. Whenever data travel – whether from one place on Earth to another, from one machine (or computer) to another or from one medium (e.g., punch cards) to another (e.g., magnetic tape) – data friction impedes their movement’ (p. 84).
Several authors have adopted and further expanded the notion of data friction. For this article, we identify two complementary moves. On the one hand, Bates (2017) has proposed an enabling understanding of data friction as a phenomenon to foster, rather than overcome. On the other hand, while seeing data frictions as heuristic moments, Pelizza (2016) has proposed a framework to track the costs associated with them. Paraphrasing the syntactic and semantic dimensions of interoperability, Pelizza has suggested that data frictions can become visible during programmes aimed at making systems interoperable, and they would not be visible otherwise. She has addressed data friction as a dynamic interplay between aligning (i.e., ‘syntagmatic’ or syntactic dimension) and replacing (i.e., ‘paradigmatic’ or semantic dimension) infrastructural components that facilitate data exchange, where changes in one dimension impact the other. Even in complex systems designed to mitigate friction, complete removal is often unattainable; instead, the associated costs tend to shift between dimensions. That is, solving friction by replacing semantic components may entail costs on the syntactic dimension, and vice versa. This earlier work suggests that integrating databases through data matching could create a need for new organisational capacity required to address emergent issues that did not exist before, including data synchronisation and the handling of duplicate records. For instance, as identity data transition from a physical passport to a digital database record or move between the systems of different organisations, friction may emerge, causing inaccuracies or loss of information because of syntactic differences (e.g., variations in date formats) and semantic inconsistencies (e.g., differing interpretations of family relationships).
Similar concerns about the invisible costs of interoperability have been moved by critical data studies. The term ‘data work’ has been re-actualised to draw attention to the often invisible human labour required to make data usable, including processes such as data cleaning, standardisation and resolving inconsistencies to make data usable and valuable (Ratner and Ruppert, 2019). While the reference to feminist Computer-Supported Collaborative Work (CSCW) scholarship prompting attention to invisible work in various settings (Star and Strauss, 1999) is explicit, scholarship has mostly focused on the work necessary to make secondary data available in interoperable health care systems (Bjørnstad and Ellingsen, 2019; McVey et al., 2021; Pine, 2019; Pine and Bossen, 2020). The most recent literature has expanded the framing to the work of curating data for AI purposes (Gray and Suri, 2019; Muldoon et al., 2024; Tubaro et al., 2020), while little attention has been paid insofar to the data work of securitisation and identification, with the notable exception of Harb (2025) in humanitarian contexts. With this article, we aim to extend the understanding of data work to the field of (re-)identification of third-country nationals.
In what follows, we show that data matching for re-identification does not simply minimise friction in pursuing univocity but it also entails new data work across multiple interoperability dimensions, and thus extra organisational costs. For example, re-identification may necessitate invisible data work to connect related identity records, manage duplicate entries, correct inconsistencies in syntactical date representation (e.g., MM/DD/YYYY vs. DD/MM/YYYY) or address the semantic uncertainties arising from the use of placeholder dates like 01/01/YYYY when only the year is known. This data work reflects how efforts towards data interoperability reach beyond technical issues to include syntactic, semantic and organisational factors. The next section outlines the empirical case and the research methods used to examine re-identification through a data matching solution and the data work it entails.
Case and method: Empirical analysis of the interplay between data matching systems and applicant re-identification
The empirical case is based on data collected during fieldwork conducted by the first author, both in person and remotely, from July 2020 to July 2021. The research involved collaboration with two organisations: WCC Group, a Dutch company specialising in the development of ELISE, a data matching software, and the Netherlands’ Immigration and Naturalisation Service (IND), which uses ELISE for its operations.
The IND is a government agency that handles various tasks related to international mobility, including processing foreigners’ residency and nationality applications in the Netherlands. To carry out these tasks, the agency uses the ELISE software to search and match applicants’ identity data within its back-office system. By design, the matching system aims to address the challenges of integrating data from multiple sources. For example, it can identify applicants even if their date of birth is incomplete or the day and month are switched in the database. This identification is enabled by a ranked list of potential matches, each with an associated score that quantifies the likelihood of a correct match between identity records. Accordingly, ELISE facilitates re-identification using diverse matching algorithms to address data frictions in personal data from different locales, scripts and cultural contexts.
The interoperable data infrastructure for applicant re-identification
This section sets the stage for analysing data frictions and data work by outlining practices of re-identification of applicants 2 at IND and showcasing their reliance on an interoperable data infrastructure. IND's practices rely on an information system called INDiGO, designed to manage the identification and registration of applicants who apply for various purposes, including residency or naturalisation. The INDIGO system interfaces with the software supplier's ELISE data matching system to calculate potential matches when searching for applicants. For example, a user may search for a name, date of birth or identification number in INDiGO, and the ELISE system will use various algorithms to detect potential matches in the records, accounting for both semantic and syntactic variations. Using a range of matching algorithms, the system generates a score that reflects the probability of a match between the search query and database records. These results are then used to identify the one that corresponds to the individual applicant in question.
INDiGO and the IND are just one link in the ‘migration chain’ (
The information infrastructure of the IND and the migration chain can be characterised as a form of interoperable data infrastructure in which the unique identification of individuals is crucial for re-identifying foreigners in the Netherlands by diverse chain partners, through multiple practices and at different points in time and space. Specifically, the IND, as one of the chain partners, relies on the BVV systems and v-number for effective identification and re-identification of applicants. Through this re-identification process, the IND verifies whether an applicant has already been registered upon arrival by other chain partners and holds a v-number.
For interoperability to facilitate consistent applicant re-identification among chain partners, it is expected to address frictions occurring along with the different dimensions of interoperability. In what follows, we analyse the integration and use of the ELISE data matching system within the IND's systems to show how data frictions can occur along with the organisational, semantic, syntactic and technical dimensions. This will allow us to suggest that data matching not only deals with data friction but it also introduces extra, unseen data work.
Data collection
The data collection process was helped through collaboration with WCC, granting access to technical documentation and informative meetings, some of which occurred on-site at the company's headquarters in Utrecht, the Netherlands. The collected technical documents, including the ELISE system's specifications and its INDiGO implementation, helped structure questions for subsequent interviews with IND staff. WCC facilitated initial contact with the IND, yet pandemic constraints necessitated interviews to take place via online meetings and phone calls. In five interviews, each spanning an hour, IND personnel shared their experiences with the data matching tools at the IND. Fieldwork observations and six interviews with WCC staff provided additional context for the data matching software's design and deployment at the IND.
The interview protocol for IND staff explored began with general inquiries about the interviewees’ roles within the IND, providing a context for understanding their organisational role. The first interview questions centred around three steps deemed central to the searching and matching of applicants’ data. These steps are how search queries are formulated, how matches are computed, and how search results are handled. Regarding the first step, questions included IND personnel's approach to formulating search queries, including the data categories they input, their knowledge of data elements and match features that yield better search results. For the second step, questions were tailored to uncover the personnel's expectations regarding the match results and their understanding of the match engine's functionality. The third step's questions explored how participants process search results, addressing their perception of result quality and the ranking of matches. Lastly, the protocol investigated participants’ use of other non-IND systems and data to support applicant identification.
Data coding and analysis
After collecting and preparing data from documents and interviews (including transcription), the data was coded and analysed using the computer-assisted qualitative data analysis software ATLAS.ti. Initial data coding began with deductive coding, aligning with the three steps in the search and matching of applicant data, as outlined in the interview protocol (the formulation of search queries, match computation and handling of search results). The data coding utilised predefined codes, such as ‘search query’, ‘search engine’ and ‘search results’. Next, these codes were refined through an inductive approach, recognising patterns across interviews.
The next step of refining and collecting codes proceeded by adding a colon ‘:’ to the code and names to introduce inductive sub-codes. For instance, ‘search query: use of data: amount of data available’ corresponds to a deductive category concerning the types of data employed in crafting search queries (‘search query: use of data’). The inductive element (‘data available’) emerged from the interviews and was used consistently for quotes referencing the availability of data for search queries.
Lastly, patterns, processes and typologies were identified within the codes. Figures 1 and 2 showcase examples of challenges related to search query input and output. Figure 1 outlines challenges during search input, including problems related to typos, data transcription and data input uncertainty. Figure 2 outlines challenges related to processing search results, including excessive or insufficient results and unexpected results. These findings underpin our analysis of data friction in interoperable identification systems, showcasing how data matching mechanisms created to overcome friction can both facilitate and hinder re-identification, thereby increasing the required data work.

This figure illustrates how friction in search query input was identified through analysis of interview data.

This figure illustrates how friction in interpreting search results was identified through analysis of interview data.
Data frictions and the data work of re-identification
As explained in the previous section, the ELISE system focuses on reducing friction that impedes the seamless exchange of identity data. While mitigating some frictions across different interoperability dimensions, ELISE also introduces three new types of data work: across organisational boundaries, in queries and matching, across time.
Re-identification and organisational interoperability
The first type of data work emerges along the organisational interoperability dimension and was observed in relation to IND personnel's practices of consulting and linking data in INDiGO from the BVV system. It reveals a gap between standardised identification practices and the idiosyncrasies of institutional procedures. In particular, differences in organisational practices can necessitate data work to align diverse organisational contexts.
The following interview quote from an IND staff member offers evidence of this first type of re-identification data work. Upon receiving a new application, chain partners such as the IND must ensure that an individual has not been previously registered and does not already possess a v-number. If so, officers should not create a new v-number, but link the item on INDiGO with the existing v-number in BVV. To avoid creating duplicate data, INDiGO requires IND personnel to search and match personal data on the BVV system. When disparities arise between the information stored in the IND's database and the BVV, these discrepancies are noted for future investigation and resolution. We actually search first on the system called BVV […] We click on a button, and then a search is made for the personal details that then appear. If we have a hit, it means, for example, that either the Royal Netherlands Marechaussee, Foreign Affairs, or the police has ever registered the applicant. Well then, the data only occurs on the system called BVV. And if so, well, we’ll make a link. Then we click on a button, and then there is a connection between the data from the BVV and the data we have received from the municipality. And if that is not the case, for example, you can find the applicant in the BVV and in our IND[iGO] system. That's when you press another [search] button. And when it turns out that the applicant appears in the BVV and the INDiGO system, well, then we check in the INDiGO system whether the names match completely, for example. In the case of small changes in the name data, we also look further into the file. And if we do come to the conclusion that ‘this is the same person’, then we also make the connection, so we register the applicant. We link the data together. Then you only have one applicant file, and nothing is wrong.
Let us further consider a real-life scenario of an individual applying for residency at a Dutch municipality. The municipality employs an automated data exchange mechanism that notifies the IND to update the applicant's data within the IND's systems. Even in this case, interoperability assumes an authoritative data source. Frictions in these automated message exchanges can result from a divergence between the municipality and IND regarding the type of source certificates used by the two organisations (e.g., birth certificate or passport). Here is how the interviewee describes the problem: In principle, the municipality only registers applicants who submitted such an application [for a residence permit] to the IND. A condition for registering with the municipality is that applicants must identify who they are. So that can be done, for example, with a birth certificate, a copy of a passport, or an identity document, or other documents, so to speak. The municipality does have a different kind of policy on identification than, for example, the IND. They have a different ranking of pieces that they, well, consider important to have. For example, we – the IND – see a copy of a passport sufficient, or an ID card, or even a laissez-passer. The last one is a kind of document issued by the embassy if the applicant does not have a passport or ID card. But for the municipality, […] the most important document to register someone is actually a birth certificate. And then you sometimes have differences because, for example, applicants from, well, for example, from Ukraine. They have, say, a name and then a patronymic. That [patronymic] actually refers to the name of their father. And then the family name. And, well, that patronymic is often included in the registration by the municipality. But the IND, on the other hand, does not necessarily register based on the birth certificate data; because those data were once given at birth, but of course, they may have changed after many years because it is possible, by the way, that you take your marriage name, for example. So, if the applicant submits a passport with the marriage name, the IND will register the applicant based on the passport data. While the municipality uses the birth certificate data. So you already have a difference. And we may then receive an automatic message [from the municipality], which the system cannot automatically link to an applicant. (IND staff interview, January 29, 2021)
The data work of sense-making of queries and matches within interoperable systems
This section further addresses the practices of searching and matching to show how data work is required by users to understand the technical functioning of data matching to formulate and interpret search queries for re-identification.
IND personnel underscored that the most frequently utilised method for searching and re-identifying applicant data uses the unique personal identifier v-number. This is because traditional identifiers such as name, nationality and date of birth are considered more susceptible to frictions. For example, frictions can arise from misunderstandings and human errors, especially when IND personnel must decipher and transcribe handwritten documents or phone conversations, as noted by one interviewee: [Y]ou have to deal with perhaps unclear handwriting. Sometimes there is also an authorised person who fills in that information for them. Then it can, of course, be a human error and just a typo in a date of birth or something like that. (IND staff interview, January 29, 2021) It is also difficult to distinguish between first and last names with certain names. So you have to make slightly different combinations yourself: what can be a first name? What can be a last name? (IND staff interview, November 10, 2020) In many cases, you have the same applicant, but it is only based on a phonetic slip. But in a lot of cases, it's also that it's just not the same person. (IND staff interview, January 29, 2021) [I]f you search a certain way. […] I just don’t know exactly. There is an exact search on it, so if you make a typo there or write the name slightly differently, you will actually get what you expect in terms of results. [But] When you find the applicant afterwards, you think, hmm, why hadn’t it actually found it on that personal data I had tried first? (IND staff interview, November 10, 2020)
IND personnel sometimes had to develop strategies to interpret and manipulate search results to ensure that the (allegedly) correct information appeared at the top of the list, thus highlighting the added interpretative and organisational costs associated with managing re-identification processes. The ELISE data matching system's technical design is optimised for search queries that include as much information as possible. However, there can be mixed sentiments among users regarding the usefulness of providing more query data. While some perceived that additional data enhanced results, for others, this did not guarantee improved outcomes: If you have more data, you look at what more you can put in it. So you’re actually trying to make it as broad as possible. If you have a date of birth, you have a street name or you have something else to increase the matching percentage, then you actually also look – if there are multiple search results – then you actually look first at the highest matching percentage. […] My experience with searching for personal data is that the more data you enter, the more difficult the result will be. And the worse the result actually gets. So I often build it up. I [input] less data, and if necessary, I add some data if there are too many results. (IND staff interview, November 10, 2020) […] what you often see in how they work is that hey, they use it first with one type of data. And if they still get too many results, or they don’t see it, they try with an extra piece of data. Or they try it with another kind of data. So you see, to find a person, they sometimes do five searches in a row. Also, a little, OK they could enter everything at once, but you can see they play with that a little. (IND staff interview, August 5, 2020) And there is also a kind of self-check in [the search process]. So, I often start by searching by first name, last name[…] But I often try not to [enter] too much [information in the search query] and see if the results [include the applicant]. And on that basis, [I check if] the date of birth also matches the date of birth that I have. So, I don’t always input what I have available as information. But I also partly use it as a checkpoint for the [top] search results that I then get. It also works a bit more efficient for me. It makes no sense to enter much more data. Because you can find the applicant anyway, it is sufficient on the basis of first and last name. And you gain insight [that allows you to] immediately know that you have the correct [applicant]. (IND staff interview, November 10, 2020)
The data work of managing legacy data and resolving matches across time
Lastly, further data work is revealed through the matching of data that reuses information stored in ‘historical fields’. These ikspecialised fields in INDiGO accommodate the storage of multiple values for the same data category, allowing for the retention of various historical information about an individual. For example, a search can be conducted using an individual's premarital name, yet the results display an applicant with an entirely different postmarital name. Users reported grappling with comprehending the logic behind, as only the most recent values from these fields are considered in the search results. An interviewee captured the complexity of deciphering matches based on historical data in the following manner: [What] is actually very interesting in [the case of the IND] is that someone does not just have one address but can have several addresses, for example, or even several names. And he may have changed his name, for example. So then the old name is also saved. You actually have a primary field, for example, for name or address. And you have historical fields. And they are all searched with ELISE. […] So we actually have the history of every field. That can contain one value, but it can also contain ten values. […] I think they’re not always aware of that. That if they find someone, it can also be based on an old date of birth, which has been entered incorrectly, or based on an old name. (IND staff interview, August 5, 2020)
These three forms of data work underscore a paradox in using the ELISE data matching approach to address frictions across various interoperability dimensions and facilitate re-identification. The complexity of calculating the results emphasises the subtle interplay between human interactions and the capabilities of data matching tools, both of which can facilitate or hinder re-identification. While interviewees showed understanding of some aspects of the data matching system, such as basic fuzzy search techniques, it became clear that the system incorporates additional features and autonomous functionalities beyond their explicit awareness. While these features can streamline re-identification efforts, users may lack a comprehensive understanding of the underlying mechanisms. Consequently, disparities in comprehension can introduce additional data work to refine, comprehend and evaluate search results for accuracy and reliability. In the next section, we further discuss the broader relevance and implications of these findings, highlighting their contributions to the literature and policy.
Data friction at the dimensions of interoperability and the costs of failed re-identification
This paper examined re-identification processes making use of the data infrastructure of the Immigration and Naturalisation Service (IND) and the broader Netherlands’ migration chain. Through our analysis, we explored how the organisational, semantic, syntactic and technical dimensions of interoperability intersect with the IND's applicant re-identification processes. The examination unveiled three notable forms of data work necessitated by data friction that might hinder applicant re-identification. First, extra data work may arise from friction arising from organisational interoperability discrepancies, stemming from standardised identification procedures versus institutional practice variations. Second, friction relating to syntactic and semantic interoperability may necessitate data work to ensure the precision and accuracy of identity data as it transitions across different mediums and is utilised in formulating search queries. Third, friction relating to the technical mechanisms of calculating matching and the challenges of ranking search results can entail data work for achieving effective re-identification.
In all cases, far from being a taken-for-granted aspect of automated identification infrastructures, interoperability for re-identification is often an
What remains to be seen is whether and for whom invisible data work as the one evidenced in our findings has repercussions for making digital identification ‘unfair’, and what alternatives are possible (Masiero, 2024). The re-identification methods described here apply to systems where state agencies manage identity data, including its definition, matching and verification. Other options exist, including systems with decentralised identities that empower individuals to manage their identity data, controlling what information is disclosed, to whom and when. Yet, while decentralisation can enhance individual control over identity data and restrict broad data matching, it also constrains the ability of state agencies to deliver seamless and integrated services. There is therefore a definite need for careful consideration of how interoperability is implemented, to balance the conflicting demands of efficiency and control with fairness, transparency and protecting individual rights. As identification infrastructures become interoperable, the costs of making systems ‘frictionless’ can fall on various actors and entities through hidden data work or reduced agency for individuals.
Footnotes
Acknowledgements
The authors wish to thank the WCC Group and its identity and security team for their collaboration and support during fieldwork. The authors also thank the employees of the Netherlands’ Immigration and Naturalisation Service (IND) for generously sharing their work experiences, which were essential to this research.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article was written in the context of the ‘Processing Citizenship’ project (2017–2023), which has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program under grant agreement No 714463.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
