Abstract
One of the key features of the contemporary data economy is the widespread circulation of data and its interoperability. Critical data scholars have analysed data repurposing practices and other factors facilitating the travelling of data. While this approach focused on flows provides great potential, in this article we argue that it tends to overlook questions of attachment and belonging. Drawing upon ethnographic fieldwork within a Danish data-linkage infrastructure, and building upon insights from archival science, we discuss the work of data practitioners enabling the repurposing of pathology samples extracted from patients for the conduct of ‘personal medicine’ – our term to discuss the so-called old-fashioned treatment of patients – towards personalised medicine. This first involves ‘getting to know’ the tissues and unpacking their previous uses and meanings, then detaching them from their original source to extract data from such tissues and making them flow towards a new container where they can be worked on and connected with other data. As data practitioners make these tissues travel, transforming them into research data, they organise the attachments of data to new agendas, persons and places. Crucially, in our case, we observe the prominence of national attachments, whereby managing tissues and data in and out of containers involves tying them to the nation to serve its interests. We thus expose how the building of data linkage infrastructures entails more than the accumulation and curation of data, but also involves crafting meanings, futures and belonging to specific communities and territories.
Introduction
The point is to use the registers of all the Danes and connect them with the pathological biobanks. The biobanks have been collected since the 1970s and by linking them with the population registers, we can do studies of biomarkers and connect them with life histories and trajectory of diseases. (Interview with Ann)
Ann is a Principal Investigator (PI) within Challenge, a data infrastructure set in Denmark, whose aim is to connect the registries that accumulate and develop data on Danish residents with the biobanks of human pathological tissues that have been collected for the clinical purposes of diagnostics. As it does so, Challenge looks to produce biomedical knowledge that can help provide more precise diagnostic and prognosis tools. As such, Challenge can be understood as a personalised medicine endeavour. The broad term of ‘personalised medicine’ has come to signify a shift within healthcare and the medical sciences from a one-size-fits-all approach towards the tailoring of diagnostics and therapeutics to individual characteristics (Collins, 2010). This is made possible by the building of personalised health maps that bring together ‘structured, digital, quantified and computable data’ about individuals’ genetic make-up, lifestyle and clinical experiences (Prainsack, 2017: 4). Crucially, personalised medicine efforts rely on what Sharon and Lucivero (2019: 1), in this journal, characterise as the ‘expansion and decentralisation of the health data ecosystem’ through the mobilising of a range of data sources to capture and accumulate ever more data on individuals (Sadowski, 2019). The case of Challenge is an illustration of such efforts to expand the health data ecosystem. Specifically, it does two things simultaneously: first, it connects diverse data sources, containing socioeconomic, environmental and clinical data; second, it finds new uses for data, such as pathological tissues that were generated for different purposes or as Isin and Ruppert (2019: 220) put it, data are ‘repurposed … for generating knowledge to serve governing objectives sometimes far removed from that for which the data were originally produced.’
Drawing upon this case, this paper interrogates data diversification strategies within personalised medicine, examining efforts to utilise and connect data from a range of sources, whether they are considered new, neglected or almost forgotten. In particular, we interrogate what it takes to make data extracted from human pathology tissues, which are connected to individuals and their flesh, flow towards new uses and futures. Such repurposing practices raise important questions for the sort of relationships they shape between individuals, state and territory. In particular, data flowing from tissues stored in clinical archives to the research domain for the purpose of personalised medicine entails three key stages. It first involves getting tissues, which were extracted from individual patients for the conduct of ‘personal medicine’ – our term to refer to the ordinary or so-called old-fashioned treatment of patients – out of the national pathological archive – what we term the ‘personal’. Second, it includes selecting a number of representative samples from the pathological archive to form a collective resource that can stand in for the population – what we term the ‘collective’. And third, this process involves using the newly formed collective resource to produce clinically relevant information that can be used to personalise treatments – what we term the ‘personalised’. In other words, the repurposing and linking of data sources in personalised medicine involves a flow of data from the ‘personal’ (data representing individuals) to the ‘collective’ (common resources representing the population), and then to the ‘personalised’ (representing the population, but for individual benefits). Managing tissues and data, for data practitioners acting on behalf of the state and a national audience, involves constructing relationships between these entities, individuals, institutions and the nation. In this paper, we examine how these three configurations of the personal, collective and personalised intersect, converge or sometimes come into conflict as human tissues are repurposed from the clinic to the research domain, while we explore how these relationships reflect and shape the belonging of people and data to specific national communities and territories.
Flows, attachments and archives
Critical data studies provide a good foundation to analyse such data practices. Broadly speaking, authors in this line of literature have drawn attention to the networks of people, technologies and institutions needed to make data (Borgman, 2015). They point to different forms of ‘data work’ necessary to produce and utilise data, such as researchers ‘cleaning’ the data (Pinel, 2020) or producing metadata to enable later users to understand the context in which the data were first produced (Edwards et al., 2011). Of particular relevance to our case, authors have taken an interest in questions of flows and repurposing of data. They posit that one of the key features of contemporary Big Data is the widespread circulation of data from one site to another and their interoperability (Leonelli and Tempini, 2020). Focusing on the journeys of data from one realm to another, authors discuss what enables the travelling and re-use of data, with for example Leonelli (2016) pointing to database curators, or ‘packaging experts,’ who organise data to enable their circulation across domains and projects, or Hoeyer et al. (2017) discussing the ‘ethics work’ needed to make biomedical data flow across contexts. The emphasis is placed on the movement of data and the type of work needed to make these movements happen. However, movement, we argue, comes with attachments to places and people. That is, even when data flow towards new fields, they are materially connected to specific places, groups and individuals who helped create them or previously used them. These attachments to communities or institutions are materially enacted by data practitioners and contribute to the shaping of data and their movements. These questions are particularly prominent in our case concerned with human pathological tissues. Tissues are connected to organs, bodies and persons, and as we will show, making use of such tissues as part of a data linkage endeavour entails managing these attachments. We use here, and in the rest of the paper, the term ‘attachment’ (instead of, for example, ‘relation’) to specifically capture those relationships between individuals and entities which denote intimacy, moral responsibility as well as an affective dimension (Navne et al., 2018). That is, when one is attached to an object such as a collection of tissues, this person is both affectively invested in the collection and committed to its preservation and/or growth.
To help us expose such questions of belonging and attachments, we build upon and expand on a body of literature within science studies focused on tissues and their use as resources for the production of ‘biovalue’. Waldby (2002: 310) coined the term the ‘tissue economy’ to describe how human tissues 1 become goods that circulate to new contexts, where they are redeployed to new uses and transformed through ‘the biotechnical reformulation of living processes’. Tissue economies are deeply imbricated with ideas and feelings about identity and belonging. Tissue collection signifies relationships between bodies, whereby one body can share its vitality with another through the circulation of tissues (Waldby and Mitchell, 2006). Throughout the literature, scholars shed light on different relationships which are actualised within tissue economies, therefore connecting individuals, and their bodies and flesh, to a community of beings. For example, some discuss the ways in which connections between individuals and the ‘imagined community of the nation-state’ (Anderson, 1991) drive tissue donation, whereby donating one's tissue is understood as an act of civic responsibility between citizens of a nation (Starr, 1998; Titmuss, 1997). Others pay attention to the institutional complex of tissue banks, pharmaceutical and research companies taking part in tissue economies and shaping the flow of tissues (Gottweis et al., 2009). Authors point to the ambiguity of tissues in these settings, whereby tissues are both connected to donors and hold value precisely because they are part of the person while at the same time the tissues, when entering such institutional collections, take on new meanings and forms: they are transformed into resources and become institutional data with loose ties to the individuals from whom they originate (Hoeyer, 2009; Tupasela, 2011). Scholars provide here important analytical tools to study the building of tissue collections and the networks enabling the circulation of tissues across individuals and institutions. However, this often comes at the cost of more detailed analyses of the work of professionals supporting such tissue economies – work connecting tissues to an economy of data, and organising the past, present and future belonging of the tissues to individuals and collectives. As such, little is known about how human tissues – which come with their own histories and attachments to people and places – are made open to new temporalities and uses as they travel to new contexts; or again, how links are enabled between these repurposed tissues and other data sources for the tissues to serve the interests of different sets of individuals or collectives.
To answer these questions, we bring the tissue economy literature and critical data studies into dialogue with archival science. The term ‘archive’ refers to both the documents or materials produced by persons, institutions or nations and the place within which such records are kept, such as buildings or web addresses (Zeitlyn, 2012). A diversity of items can make an archive, from national census registry data to fire insurance maps and newspapers, or again pathology samples resulting from clinical procedures. The enduring value of an archive rests on the continued use of materials for a diversity of purposes beyond the immediate one for which they were created. Crucially, archival scholars highlight the central role of archivists. Put simply, the archivist's work consists of collecting, preserving and describing materials for them to be available across institutions and fields (Marquis, 2007). Negotiating this tension between an enclosed archive composed of old materials, which is yet open to new futures, is core to the work of archivists. In practice, this involves organising records to create ‘archival representations’ (Yakel, 2007), by first reconstituting materials’ histories to understand their origins and ties to specific people and places, and then representing these materials in the archives to allow for new ties and meanings to be attached to them. The work of archivists in fact resembles that of database curators, which has been discussed by Big Data scholars (e.g. Leonelli). One key difference, however, rests on the degree of materiality of the objects worked with. The archivist mainly works with material entities, which are subject to time and have to be handled with care to be preserved (e.g. old correspondence and human specimens), while the database curator deals with digital data. This, in turn, means that access to archives is more limited than that of a digital archive – the archivist ensures that only a set of people at a time are provided access to the materials stored and under specific conditions. The work of an archivist also consists of deciding which items are to be kept and become archives, and which are to be omitted, or as Bowker (2005) underlines, archives also exclude materials and destruct memories. Crucially, and as Bowker also powerfully argues, archives should not be seen as catalogues of neutral facts, but rather as political and philosophical constructions that create and organise memory, and it is through the work of archivists enacting ‘memory practices’ that such constructions come to life. 2
The three fields of tissue economy literature, critical data studies and archival science, while all concerned with storing and utilising materials, provide different sets of insights useful to our inquiry. Tissue economy scholars stress that organising tissue economies entails dealing with questions of belonging, while critical data scholars, with their focus on ‘data work’ and their attention to the professionals of data economies, help us decipher the work that goes into making data travel. Finally, from archival science, we learn how making use of old materials entails building relationships across past, present and future, but also across individuals and communities, with professionals unpacking the ties of materials to previous users (and owners) to create an archive that can open to new futures and meanings. Bringing this literature into a conversation, we aim to complicate the question of data flows by investigating the work it takes to enable them, analysing the restrictions and conditions, which come with flows, while questioning how attachments between entities, individuals and communities are organised while data flows. Drawing upon these insights, we, therefore, ask: How are pathological tissues originating from patients, who underwent biopsies or surgeries in the clinic, repurposed and made to flow towards new uses in personalised medicine? How are tissues’ attachments and meanings organised in repurposing archived human tissues?
In answering these questions, we first discuss the case studied and methods employed; then turn to our empirical findings, unpacking the processes whereby pathological tissues are transformed into common resources for personalised medicine; and finally, we discuss the broader relevance of our findings for critical data studies.
The case and methods
Our study is set in Denmark, a Scandinavian welfare state, with a national health service. Similar to other North European countries, Denmark has a long history of national registries, the official mandate of which is the collection of personal, financial and health-related data concerning the population (Bauer, 2014; Cool, 2016). These registers are commonly thought to be widely accessible for research, and linkable through a common key, a unique personal identification number assigned to all residents in Denmark (referred to as ‘CPR’) (Nordfalk and Hoeyer, 2020). The national registries benefit from widespread public support, with the general expectation, encoded in law, that information gathered administratively be used to produce knowledge and social value for the benefit of all (Ludvigsson et al., 2015). Many have discussed the Danish and other Nordic national registries as providing ‘unlimited possibilities for epidemiological research’ (Schmidt et al., 2019). These data resources have also been the subject of much political investment, with, in 2016, the Danish government launching its national strategy for personalised medicine, particularly emphasising the potential of national registries. With its data sources and infrastructures, many look up to Denmark as a place where personalised medicine can be realised. Studying how data is made to flow and integrate into such a system is not only an opportunity to learn about the specificities of the Danish context, but more broadly about the challenges and issues of relevance for many other settings, which seek to build up and mobilise population-based data sources for personalised medicine.
These unlimited research possibilities are at the centre of the Challenge platform, the empirical case study of this article. Challenge is a collaboration between Copenhagen University, the Pathology Department of Copenhagen University hospital, and Statistics Denmark, a government institution overseeing the production and maintenance of more than 250 population registries. By bringing these institutions together, Challenge aims to realise the ambition of personalised medicine as it looks to combine insights from different data sources, including population registries and pathological archives. For this reason, Challenge was chosen as the case study for this article.
Our involvement within Challenge has been twofold. Since the start of Challenge in 2018, the second author has been a member of its steering group, engaging in discussions on the project's ethical conduct and societal contributions. In parallel, the first author has been carrying out ethnographic fieldwork in Challenge's ‘Diseases group’. Conducting epidemiological research using human pathological samples, the group is focused on the identification of biomarkers for a number of diseases, such as fatty liver disease or dementia, for which gold-standard diagnostics require a biopsy and the analysis of tissue by a pathologist. Investigators in the Diseases group look to bypass the need for biopsies by analysing existing pathological tissues from patients who have gone through the healthcare system, to identify markers that could diagnose patients or indicate risks of quick disease progression. Ethnographic fieldwork took place over a period of 12 months, between August 2019 and July 2020. It involved observing and participating in the daily life of the team, sitting in meetings, attending seminars and sharing coffee breaks with team members. The ethnographic data we draw upon in this article consists of field notes from participant observation, informal conversations with staff, reflections from meetings, documents collected, as well as in-depth semi-structured interviews with members of the team and Challenge partners carried out by both of us or by the first author (eight in total).
In what follows, we describe three key stages through which tissues from the pathological archives become a research resource, starting with the vision articulated by Challenge members for using pathological tissues for personalised medicine.
Linking data sources to maintain the national data advantage
‘The advantage of Denmark in terms of data is shrinking’. In front of a full auditorium, Emil, a newly appointed professor in Epidemiology delivers his inaugural lecture at the Department of Public Health of the University, specifically discussing his interest in registry data. A ‘shrinking advantage’: this is how Emil describes one of the challenges facing Danish register-based epidemiology. He occupies a dual post across the University of Copenhagen and Statistics Denmark, and as part of this post, Emil is involved in Challenge, leading the project within Statistics Denmark. While he explains that Denmark is a powerful data resource, with its national data registries that produce and accumulate data on individuals, Emil emphasises that ‘other people have data as well’. He implies that Denmark is involved in the global data economy where a race is taking place among a broad range of actors, including states and commercial firms, who have been building extensive and detailed data sources on citizens. Emil is particularly concerned with ‘preserving the data advantage’ by identifying and making use of additional data sources. The aim, more specifically, is not just to collect more data, but rather to find other forms of data and create new ways of integrating them. In his lecture, he asks: ‘Are there any extra data we can mobilise?’
In the Pathological Archives, Challenge found the ‘extra data’ they had been looking for. These are samples that have been taken from patients, as part of biopsies or surgeries, which have been cut, sliced and stained to make thin tissue plates to be analysed for diagnostics. This means that, in the first place, these samples were not created as data to be used for research, but rather were collected for clinical purposes. These samples have been stored since the 1970s as part of a National Pathological Archive. They are, however, still located in and attached to local hospitals, which can continue using them as part of clinical practice. We distinguish two main ways in which Challenge members approach these samples. On the one hand, tissues from the pathological archive are seen as an untapped research resource that, in the words of Emil, are currently ‘sitting in unused repositories’ in the ‘basements of hospitals’. On the other hand, Challenge members do not lack superlatives to discuss the potential research value of these pathology samples. For example, Ann, the PI of the Diseases team, describes the liver samples she uses in her main project as ‘gold.’ In fact, these two ways of discussing the tissue samples go hand in hand. That is, by framing the tissue samples as ‘neglected’ in storage rooms of hospitals, Challenge members make space for new possibilities to use the samples and create value from them. Framing the samples in this manner ontologically detaches them from their original source and purpose, therefore ‘freeing them’ for new endeavours (Svendsen, 2011) and opening them up to new futures and meanings.
For Challenge members, the value of the tissue samples particularly comes to light when they are linked with other data. In an interview, Ann explains how her main project based on liver tissues differs from existing projects: A lot of people have used the Pathological Archive register-wise, but no one before has done this: where you actually go and get that out of the biobank, to research something on the biopsies and then connect it. (…) I will be the first one to get the tissue out and measure stuff on it. (Interview with Ann)
For Ann, Challenge is an opportunity to analyse liver samples for the information they contain and in relation to registry data. Crucially, this involves making the pathological samples flow. This is a dual process, which entails both bringing pathological samples out of their original source (a movement) and converting them into digital data to be used for research (a transformation). As such, these pathology specimens, and the accompanying pathology reports, represent potential data that need to be turned into digital data before they can become interoperable with broader epidemiological data. By making these tissues originating from individuals flow towards Challenge where they can be connected with other data, Challenge members build a data collective from personal materials, which then can be harnessed for personalised medicine. In this data collective, digital data extracted from tissues can be utilised by different people at the same time and analysed endlessly.
Challenge enables these connections across datasets by building a platform where data from different sources can be gathered, or as Emil explains, Challenge constitutes the ‘house’ where these data can ‘reside’. In this house, data are organised, processed, to then be connected and ‘layered’. Rudi, the director of Challenge, explains: ‘We are layering data from organisations and individuals on top of each other to make a pile of it, and to see the extra value’. Extra value is made possible when extra data is made to flow towards the Challenge ‘house’ and linked up with other data sources. Or as Rudi goes on, these data can be ‘combined’ into a ‘soup’. In explaining this soup, he says: It's like cooking. (…) You have a recipe, you have the ingredients, but you are not able to bring it together. It becomes rich when you bring it in one box [sic] and let it bubble. (Interview with Rudi)
Rudi's and Emil's images of data flowing to the ‘house’ to be ‘cooked’ underline that flow and containment go hand in hand. This duality is crucial to understand value-generating processes within data linkage infrastructures, which are commonly discussed within critical data studies and personalised medicine literature. These bodies of work often place the emphasis on flows of data, that is, how different datasets travel outside of their original source so that they can be linked with other data to generate value (e.g. Sánchez and Sarría-Santamera, 2019). We show here that such flows do not exist without containers. Data need repositories, where they can be stored and processed (Leonelli and Ankeny, 2012). In our case, Challenge is the container. It not only hosts and organises incoming flows of data, but also connects and mixes them with different data, therefore enabling data to be preserved at the same time that new relationships and meanings can be fostered from the layering of data. Put differently, extra value is generated in the data archive of Challenge because it enables flows, containment and mixing of data. Worth noting as well is that while Challenge is conceptualised as a bounded space, this does not mean that it is closed off to others in the research community. Instead, the Challenge ‘house’ is envisioned as an infrastructure for data linking that could serve the needs of a variety of users.
While Challenge is understood as the house for these data to reside in, Statistics Denmark is envisioned as ‘the oven, the place where you cook’ (Rudi) these data together. Founded in 1850, Statistics Denmark is a governmental organisation that draws on personal data from Danish citizens to create statistics on Danish society, such as employment statistics, trade balance and demographics. It is a well-known and highly respected institution among Danes, and many actors, ranging from journalists to researchers and administrators in government, utilise its data. Through its work putting together registry data, Statistics Denmark constructs an archival representation of Denmark and its population. Rudi talks about Statistics Denmark as ‘the operating place of society’ where data from individuals are stored, analysed and used ‘for the good of everyone’. Situating the Challenge ‘house’ within Statistics Denmark places Challenge, and its data linkage enterprise, within the particular collective of the national welfare state, its tax-paying citizens and their institutions.
We in fact observe that to Challenge members, samples do not just belong to hospitals as part of their pathological collections, but they are understood as national belongings. Challenge, beyond a mere data curation project, is an opportunity to loosen the ties between tissue samples and local hospital collections by extracting digital data from tissues and analysing them in connection with other data to improve healthcare for all. As it looks to mobilise and transform tissue samples, while placing itself within Statistics Denmark, Challenge aims to move the Pathological Archive into a more public space, where it can serve the national interest
The idea of mobilising resources for the national collective is in fact common to Danes, and this often came through as Challenge members discussed their work utilising tissues and existing data. Specifically, Challenge members often refer to the Danish welfare state, and this in the two main ways. First, some point to the scarcity of resources and the spectre of waste within the Danish welfare state. Emil, at the start of his inaugural lecture, presents a slide featuring a number in Danish Kroner. This, he tells us, represents the annual cost of maintaining the national data registries. He argues that research ought to make the best use of the data resources to justify such ‘huge investments’ while enabling ‘the upcycling of data and sharing of resources’. By ‘upcycling data’, Emil implies that research will produce out of the data something of greater value. Second, Challenge members recurrently talk about how they aim to produce research that will benefit the ‘public good’. This is visible in Challenge's main mission, ‘Using personal data to perform careful scientific explorations for the good of everyone’. The notion of the ‘public good’ is mobilised by members of Challenge as a moral imperative to act upon the resources available – that is, one owes it to the people of Denmark to use their data well and produce valuable research. These two ways of referring to the Danish welfare state mutually support each other: the upcycling of resources, such as individuals’ tissues stored in the pathological archives, is done to preserve scarce resources as well as to contribute to the good of the nation. What we find striking is the way Challenge members understood the ‘public’ in ‘public good’ as a national welfare state collective and placed themselves within this collective. Through law, personal data from Danish residents can be recycled, and as such, individuals can participate in the collective work of preserving and contributing to the welfare state, therefore creating ‘extra value.’ We view this discourse intertwining the welfare state and the value to be extracted from data as one of the key foundations upon which Challenge structures its archive. Crucially, it shapes how researchers see tissues and data originating from individuals as being ‘attached’ to the population. These data that Challenge aims to mobilise are framed as belonging to and serving a national population. In turn, and with Daston (2012) who underlines how the mobilisation of archives by scientists creates a sense of belonging to a research community, Challenge researchers’ use of the tissues and data archived can be said to instil or reinforce a sense of belonging to a national community.
In the vision articulated for Challenge, the personal, collective and personalised seem to work in harmony. Tissue samples connected to individuals, their bodies and diseases, and which were used and stored for personal treatments in the clinic, are mobilised as a common resource to be used for personalised medicine for the public good of the Danish welfare state. Individuals and the state, through their data practices, are engaged in a relationship that is both metaphoric (as data stored in the state institution of Statistics Denmark represent the population) and metonymic (as individuals and their data in Statistics Denmark become part of and serve the welfare state). Building on such metaphoric and metonymic relationships between individuals and state, Challenge is imagined as the large-scale archive that can preserve and make use of old and new data originating from individuals, thereby enabling a virtuous circle of ‘extra value’ by making possible new attachments and meanings in the data that contribute to the national collective of Denmark.
Infrastructural groundwork
To make possible this layering of data within Challenge, an infrastructure bringing together different institutions and actors must be built. This means carving out a route between institutions for different data to flow and connect. Contrary to the commonly found narrative depicting data linkage endeavours in Scandinavian countries as smooth processes principally enabled by unique personal identifiers, we show that this is demanding work, which involves building relationships and mutual understandings between actors, developing legal and ethical agreements across institutions, while also ‘getting to know’ the tissues themselves, their properties and how they can be utilised and attached to new endeavours.
Referring to herself as a ‘middleman’ working at the ‘interface’ between institutions, Ann is heavily involved in building the infrastructure for Challenge. At the early stages of her liver project, she meets with the different partners, discussing aims and methods, learning about their approaches and ways of working. For example, she met a few times with Nete, a pathologist working at the University Hospital, to discuss the content of the Pathological Archive and the ‘morphology codes’ used on tissues. These are the labels pathologists use to categorise the tissue plates they analyse for diagnostics. In these meetings, Ann asks about common labelling practices. A table provides an overview of the liver tissues stored in the archives and the codes used to classify them. Nete and Ann go through the different columns, thinking together about the codes and the clinical reasons for having samples analysed in the first place. These codes attached to the samples are an illustration of what Ahmed (2020) terms the ‘traces’ of use. That is, objects preserve memories of their previous ‘lives’ and uses through physical marks or associated files. In those discussions with Nete, Ann learns how to read these traces to understand the samples’ histories. She in fact performs the work of an archivist, gaining an understanding of the context in which tissue plates were created and stored in the archive, how they were catalogued as well as their original meaning. It is thanks to this archival work that Ann comes to understand how the Pathological Archives are built and maintained through pathologists’ day-to-day work. Crucially, Ann understands the previous attachments that come with the tissue samples. These interactions with Nete also suggest that to utilise the Pathological Archives, and thereby move from the personal to personalised, Ann first looks to understand how the tissues and associated data were generated by going back to tissues at the individual level.
Being the ‘middleman’ across these partners also means finding common grounds for them to discuss and build shared understandings. Or as Ann explains, she ‘aligns’ partners, ‘integrates’ different lines of work, or ‘translates’ expertise from one person to another. She discusses with each partner their ‘burning research questions’ and looks to build a project that can answer some of these questions simultaneously. For example, at the start of her project, Ann agreed, in concert with Rudi, that liver samples from the Pathological Archives would be used to perform proteomics analysis, which is the large-scale study of proteins and their functions. The decision was in part driven by the presence, in Copenhagen, of the internationally renowned Center for Proteomics Research. When setting up the project, Ann met a number of times with members of the Center to understand their research priorities, the ways in which they work with liver tissue, and their approaches to the study of fatty liver disease. Specifically, Ann considered how proteomics would be used to extract data from tissues, thereby transforming them into digital data to be stored in Challenge. They first discussed the conduct of single-cell proteomics analysis – an ‘innovative’ method for analysing proteomics that had been developed in the Center. But after consideration, Ann felt that this approach did not match the questions she had in mind, at the same time that it did not fit with what she felt the samples could best be used for. Or as she puts it in an interview, ‘I feel like my samples in the Pathological Archives cannot answer the burning questions the proteomics have’. As a result of these discussions, Ann found a compromise with the Center: she would perform ‘bulk proteomics’ analysis on the samples – an alternative and ‘simpler’ approach to proteomics.
This case is particularly revealing of the mediation work needed to make old pathology samples flow towards new uses. As Ann builds personal relationships between the actors involved and gains an understanding of their respective aims and approaches, she also bears in mind what ‘her’ samples from the archives can do, considering the potential new data that could be extracted and the new uses they could be put to. Ann speaks for the samples, standing for and preserving these old materials, while looking to give them new light and enable their flow out of the archives. As she does this, she becomes attached to ‘her’ samples in the sense of a deep responsibility for what they can do and what they should be used for, while the samples, as part of Challenge, become attached to her (Pinel et al., 2020). This relationship tying Ann to the samples leads her to prioritise certain uses over others, therefore enabling tissues to be detached from their original source and form new attachments to new endeavours and futures.
This case also suggests that the infrastructure built by Challenge through Ann's project does not require every institution involved to work towards the exact same aims, but allows for flexible goals as long as they can be coordinated and serve one another. Partners might share an understanding of what Challenge aims to achieve, but they are involved in it for different reasons. This collaborative format was particularly apparent when working with clinicians. Early in her project, Ann set up a partnership with Caroline, a clinician. In meetings, Ann found out that, like many others in the clinical community, Caroline has her ‘own private collection of tissue’, which she uses for her research. As part of her routine clinical practice, Caroline has access to a wide range of patients on which liver biopsies are performed, which enables her to collect ‘extra’ samples for her collection. However, this collection is separate from the Pathological Archives and inaccessible to the wider research community. Clinicians’ private tissue collections were often criticised by Challenge members. Rudi viewed this practice as symbolising the permanence of ‘data kingdoms’ across Denmark and going against Challenge's vision and ideals for broad data sharing. Nonetheless, Ann and Rudi felt that it was important to build and maintain good working relationships with clinicians having direct contact with patients, as a way to ‘keep the door open to samples’. As such, the institutions involved adhere to Challenge, rather than show coherence of goals and approaches (Brosnan and Michael, 2014). We could also think of the institutions in Challenge as ‘being alongside’ (Latimer, 2013). That is, they are partially connected and partially disconnected – they work together but they do not share the same objectives nor materials. In fact, having the partners work alongside is crucial because it facilitates data flows: partners work alongside one another to agree to detach data from one institution and purpose and reattach them to new endeavours and actors. As the ‘middleman’, Ann enacted ‘alongsidedness’ on behalf of Challenge. Through personal relationships, she maintained the actors involved, building collaborations, which could help Challenge set up the data infrastructure envisioned, even when this meant in the short-term compromising their ideals of data sharing or managing tensions between differing understandings of what constitutes ‘good’ data use.
To enable the flow of pathology samples, Challenge also needs to create links across parallel data sources: the Pathological Archives, where the tissue samples, embedded in paraffin, are stored; the digital database, which stores information about patients and their tissues that have been analysed; and registry data organised by individuals’ national identification numbers. Challenge members worked on building bridges across these data sources. As part of his involvement in Challenge, Emil was tasked with creating the register for Pathological Archives, which would correspond to the digital and physical pathological archives, and be linked to existing registry data held in Statistics Denmark. In practice, as Emil explains, this meant ‘negotiation with the data owner’, that is, the pathology departments and health data agency, ‘on how this could be handled, and on top of that there is a legal agreement that describes how the data can be used’. Creating the register within Statistics Denmark also entailed logistical work and planning procedures for accessing data, under what conditions and by whom. Emil describes this further: I was responsible for setting up the process where we actually handed out information, so the kind of really dangerous part of the project of setting up that part logistically – getting the agreements in place, making sure the agreements were followed and [that] we got feedback on the process from the data handlers, and that the data looked the way it was supposed to when it arrived. (Interview with Emil)
This logistical work undertaken by Emil on behalf of Challenge is another instance of archival work, as it aimed to design processes for data to flow across databases and institutions, while being attentive to the materials being upcycled. This meant considering the best way to handle the archives and allow them out of their original institution while protecting the personal data they contained. For Emil, this represents the ‘dangerous’ part of the project because it means having personal data out in the open as it exits one institution to enter another. Letting the tissues and associated data travel is also dangerous because it means opening them to new attachments. As tissues and data leave the archives, they no longer primarily belong to and serve individual citizens, but can now be handled by a series of actors across institutions as part of research projects, thus opening up to new identities linked with the research endeavour of personalised medicine. The crafting of detailed data handling processes therefore aimed to create safe paths for data from different sources to travel, enabling Challenge members to organise the best ways in which data can connect with and become affiliated to new futures and purposes. Emil's shared employment across institutions is crucial to enable such flows. Beyond the ethical and legal work he enables, his position allows for institutions taking part in Challenge to connect and agree on shared objectives and resources.
These insights make clear that the linking and integration of different data sources require much more than unique personal identifiers. It is through relational labour, logistical work and institutional agreements that a road can be carved out across institutions for data to flow, pile up and combine within Challenge. These different sets of practices, we argue, constitute archival work in that it enables old materials to safely exit their original institution and flow from personal to personalised medicine, therefore gaining new meanings and value along the way. As Challenge members constitute this path by enacting ‘alongsidedness’ and planning data handling processes, they protect and organise how data, detached from their original sources, may best be used and connected to new belongings and purposes.
Refining data
Challenge members see in the tissues stored in the Pathological Archives a ‘neglected’ research resource, which has yet great potential for epidemiological research. By seeing them as holding ‘extra value’, these samples are configured as liminal: they are in between states – between waste and use, past and future, death (of many of those whose tissues are archived) and life (of future patients who could benefit from personalised medicine). However, for these liminal materials to enter ‘use’, ‘future’, and ‘life’, they must embark on a transition and be ‘refined’. The work of Lars, a research assistant working for Challenge, was instrumental in that transition. Lars was the one who carried out most of the work with pathological tissues and associated data mobilised in research projects. As he put it, his work consisted of ‘retrieving things from archives and refining them for scientists to use’. For example, this included neuropathology reports stored on the campus of Copenhagen University Hospital since the 1970s. As he explained, he ‘rescued’ these pathology reports from the basement of a hospital building that was about to be destroyed, transforming them into digital data. As pathology reports and samples on the verge of destruction are transformed into ‘refined’ research data, they gain a new temporality: they move from being close to an end to being given a new open-ended future in research.
There are a number of ways through which Challenge aims to transform and refine tissue samples into digital data, from scanning them in order to obtain visual representations, to high-throughput analysis using proteomics sequencing. Common to these practices is the work of Lars, enabling tissues to embark on a transition so that they can be attached to new meanings and futures. Tasked with the handling of pathological archives for Challenge, Lars became an interface between worlds. Like Ann, he made himself familiar with the past, present and future of the archives, engaging with pathologists, clinicians and scientists to reconstruct the constellation of actors and use shaping the archives in the past, while also imagining what this constellation would look like in the future. This process informed the selection process of what materials should be kept and transformed to integrate Challenge. Through this archival work, Lars in fact enacts memory practices (Bowker, 2005), selecting materials to be ‘remembered’ and kept ‘alive’ (Ahmed, 2020) while discarding others. This archival work also helped Lars identify the sort of information to be extracted from each material and in which format, to make them usable for both researchers and clinicians. Talking about the transformation process of samples and associated pathology reports into digital images, he explains: ‘It wasn't just the task of scanning them and creating images of all these documents. What we wanted was “minable” text (…) so that the text that they contain would be searchable afterwards’ (Interview with Lars). Central here is the transformation of materials from one format to another – here, from a human tissue stored in paraffin to a scanned image stored digitally. This transformation turns tissues connected to the personal into research data that can serve the personalised. Or as Emil explains: Most of these data sources have a native format. If we are talking about a skin sample in a pathology register, then the native form is embedded in paraffin in a basement somewhere. And then all of a sudden that needs to touch something digitized on a machine that's just 1s and 0s. (Interview with Emil)
To ‘pile up’ data from different sources within Challenge, the format(s) of the data need to be adjusted for them to ‘speak’ to one another and be linkable. In concert with colleagues from Statistics Denmark where such data would be linked to the registers, Lars identified a number of metadata points that had to be drawn out of the reports.
To make that connection, we assumed that we would need a CPR number or maybe some other information if you wanted to dig further, such as a name and a date of birth that would be able to match you up with a patient or citizen in another registry. Then going further, to get this into a database or registry such as the pathology database, (…) we would prefer [emphasis] to have some other information for it to be really usable by, not only clinicians, but also scientists. And that would be something like diagnosis, and maybe what kind of tissue was taken from the individual patient. (Interview with Lars)
Concerned with making the pathological archives usable and searchable by researchers, Lars ensured that these metadata points would be identified digitally and ‘attached’ to the scans, which would allow these data to be ‘connected’ to the public registers and other state archives to form the envisioned data pile. By selecting and extracting a number of metadata points from the pathological archives, Lars connected the archives to a wide network of potential users and purposes, therefore enabling new attachments. Lars also knits together the existing data sources of the welfare state by crafting connections across state archives and public registries. One could argue that the work of Lars resembles that of database curators (Leonelli, 2016) acting as ‘mediators’ between disciplines and contexts, to gather and make available data from a diversity of sources. In our case, however, what particularly comes through is the new temporality and belonging that are crafted as data are refined. As Lars works with tissues and data, he offers a new beginning to the Pathological Archives, through which they could be used for a variety of research projects. Lars attaches these data to new futures, and in particular, by tying together existing state data sources, he places their future use within the context of the national welfare state. This new temporality gained by the pathological archives is open-ended and cyclical – through detailed metadata, the pathological archives can be used endlessly as part of research projects.
It is thanks to Lars’ relational and technical labour that pathological archives gain new meanings and value. Deeply committed to the materials he works with, Lars facilitates new forms of archival representations. To translate the archives from past to future, from clinical to research use, Lars acts as a mediator between worlds, interpreting the pathological archives in their original contexts, to then transport them out of their enclosed archive and accompany them towards their new institutions, where they are reformatted and connected to other data. By carefully crafting new data attachments, Lars protects the data originating from individuals, while he ensures their open-ended future in research, therefore making possible the flow from personal to personalised. While we see Lars’ work – enacting archival representations essential to Challenge – as a data upcycling and linking enterprise, we also notice that Lars’ position is highly precarious. In fact, during the time of fieldwork, his one-year contract ended, which coincided with the COVID-19 outbreak in Denmark, and the decision was taken not to renew his employment. Lars’ work exploiting liminal materials is kept out of sight, located in storage rooms, heavily material, as he goes through neglected archives, and for little pay or job security.
Discussion
Drawing upon ethnographic insights from Challenge, a large-scale data infrastructure set in Denmark, this paper demonstrates that data linkage projects repurposing and connecting data from a range of sources are as much about flows as they are about storage. This observation was made possible by bringing scholarship from critical data studies together with insights from archival science. On the one hand, studies of Big Data in personalised medicine have tended to focus on flows and linkages, discussing the ways varied data sources are made to open up, and their data travel. Studies on archives take another starting point by considering first the place where materials are accumulated in order for them to be used for new endeavours. By bringing archival studies into conversation with critical data studies, we therefore come to see flows in storage and storage in flows.
Our focus on these dual processes of data flows and containers brought to the fore questions of belonging. Through their work carving out routes for tissues and data to flow and connect into a container, Challenge members organise how these can be attached to persons, groups or institutions. This observation contrasts with tissue economy scholarship, where authors (e.g. Titmuss, 1997; Waldby and Mitchell, 2006) emphasise that ideas about identity and belonging come to the fore, particularly at the level of tissue donors and receivers. Here, instead, we show that the building of a data infrastructure involves reflecting on and constructing, belonging. In our case, we tease out two intertwined meanings and approaches to belonging: belonging as the entities being managed (‘belongings’), as well as belonging as relationships, whereby these entities are attached to something or someone (‘belonging’ to). Challenge members, when managing tissues flowing in and out of containers, can be said to manage entities, and while they do so, they construct relationships between these entities and individuals, collectives or institutions. Crucially, practitioners’ own experiences and practices of belonging(s) become vehicles for orchestrating these connections. This was apparent when Challenge members, through their archival work, got to know the samples and data they worked with and, as such, became attached to them. This connection, in turn, shaped how they made these entities available, thus impacting how these belongings could be attached to new actors. But even more so, we observed that Challenge members seemed deeply attached to and saw themselves as belonging to, the idea of the Danish welfare state, and this shaped how they approached the data to be mobilised. For example, when considering how to exploit pathological archives, Challenge members were concerned with the welfare state, who helped create it through its institutions and tax-paying citizens, and the ways in which their work with personal and personalised medicine benefits the interests of the nation. Early in her project, Ann often emphasised that Challenge's vision to repurpose the pathological archives for research might come at the cost of individual value for those from which the tissue originates. She repeatedly asked, ‘What if we use the last sample?’ This was seen as problematic because clinical staff often need access to stored samples (e.g. to compare tissue plates for a patient showing a cancer reoccurrence). In this instance, the personal comes into tension with the personalised with the risk of depleting samples by using them up for research. While both aims of personal and personalised medicine seek to benefit the nation, they do so in different ways: what may help the ‘now’ citizen undergoing treatment and whose samples are stored in the archives, might disrupt the care of a future citizen who could benefit from a biomarker developed using the archived samples. Challenge members, acting on behalf of the state and a national audience, looked to balance the needs of these two citizens so that they could benefit, in different respects, what they termed ‘the public good’.
One could argue that Challenge, through the work of its members, is in fact engaged in politics of belonging. Migration scholar Yuval-Davis (2006: 205) defines politics of belonging as ‘the maintenance and reproduction of the boundaries of a community of belonging’ while also highlighting that belonging is affectively charged. As Challenge members engage in archival work, constructing a route for individuals’ tissues and data to serve personalised medicine, they operate based on a particular vision of the national community. The relationship envisioned by Challenge members between citizens and state is that of reciprocity. That is, residents on Danish territory, having access to universal healthcare along with education and social services, contribute tax money as well as their personal data. Individuals are understood to belong to the national community when they enter and use institutions such as schools and hospitals. It is also through their interactions with these institutions, using their personal CPR numbers that they become registered within data repositories, therefore enabling data to be continually collected. In other words, by interacting with welfare state institutions, Danish residents become part of the national cohort that Challenge, and other data infrastructures, make use of. When Challenge members mobilise this national cohort, organising data attachments to new projects and futures, they also enact citizens’ attachments to the welfare state collective. By getting to know the samples constituting the archives, carefully selecting the materials that are made to flow while discarding others, and crafting connections across institutions, Challenge members in fact scrutinise the inflows of data that are allowed in the ‘container’ serving the Danish welfare state. As such, Challenge members contribute to reproducing a national community of belonging. This is an affective process, with practitioners building personal connections to ‘their’ samples, as much as it is about legal and ethical concerns. In line with Bissenbakker’s (2019) discussion of the ‘attachment requirement’, a central component of Danish migration law up until 2018, this scrutinising of data and their attachments can be said to ‘orientate’ the data's ‘loyalty’ towards a future for residents in Denmark. This national ‘orientation’ of data and their attachments are reinforced by the overall national dimension of Challenge. Although Challenge researchers are highly acknowledged scholars with international profiles, we observed that they are first and foremost concerned with building a data infrastructure within the borders of Denmark with value to be produced for Danish citizens. They do not mention transnational flows of data, which could complement the Danish data accumulated within Challenge. It feels as though Denmark is enough for Challenge – it has enough data available to serve the Danish welfare state and its citizens.
With these insights, we expose how the building of data linkage infrastructures entails more than the accumulation and curation of data, but also involves articulating and crafting meanings, futures and belonging to specific communities. In other words, we show how studying the contemporary data economy also means studying attachments and belonging. Doing so, we connect with existing scholarship that has pointed to the geopolitics and national attachments invested in genomic data sourcing. In line with scholars who describe how the building of genomic repositories has the potential to define national communities by, for example, creating and standardising specific publics (Hinterberger, 2012), or providing opportunities for rising nations to assert their sovereignty (Reardon, 2017), we underline here the political dimension of data practices. However, mundane data linking and repurposing practices might look, we argue that they in fact constitute state-making practices. As Challenge members identify pathological tissues as valuable resources to mobilise and integrate into its ‘house’ together with other data to benefit ‘the public good’, they build specific archival representations of the state, its population and territory. We come to see that data infrastructural processes involve crafting archival representations, which articulate relationships between state, territory and people, thus drawing the boundaries of a community of belonging.
Footnotes
Acknowledgements
Our first thanks go to Challenge members, who enthusiastically shared their experiences and knowledge with us. We also thank Mie Seest Dam and Sara Green for stimulating discussions and providing valuable feedback on earlier versions of this article. Finally, we are grateful to members of the research meeting at the Section for Health Services Research at the University of Copenhagen for providing support and insightful comments on the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Carlsbergfondet (grant number CF17-0016).
