The Long and Winding Road: Archiving and Re-Using Qualitative Data from 12 Research Projects Spanning 16 Years

Abstract

We describe a pilot project designed to assess the feasibility of re-use across 12 diverse qualitative datasets related to Human Immunodeficiency Virus (HIV) in the UK, from research projects undertaken between 1997 and 2013 – an approach which is chronically underused. First, we consider the sweeping biomedical changes and imperatives relating to HIV in this time frame, offering a rationale for data re-use at this point in the epidemic. We then reflexively situate the processes and procedures we devised for this study with reference to relevant methodological literature. Hammersley’s and Leonelli’s contributions have been particularly instructive through this process, and following their lead, we conclude with further considerations for those undertaking qualitative data re-use, reflecting on the extent to which qualitative data re-use as a practice requires attention to both the given and the constructed aspects of data when assembled as evidence.

Keywords

biomedicalisation data re-use HIV methods qualitative secondary data analysis

Introduction

The past 25 years have given rise to considerable discussion among social scientists regarding the re-use of qualitative research datasets to generate new insights during the exploration of new questions (Davidson et al., 2019; Hammersley, 2010; Heaton, 2004; Mason, 2007; Moore, 2007; Slavnic, 2013; Tarrant, 2016). Some of the tantalising possibilities afforded by this way of working are indicated by Walters’ (2009) statement that:

the ability to revisit qualitative data in light of social change may allow the future researcher to attribute those participants with a degree of prescience about future social conditions that the original researcher was in no position to understand. (p. 313)

While such literature reflects the growing attention that is being paid to this emergent methodology, a recent review of actual qualitative data re-use since 1997 verified that only 347 published research papers had used this approach (Bishop and Kuula-Lummi, 2017). Given the proliferation of Open Data imperatives emerging from funders, publishers, ethics committees, and governments (Bishop and Kuula-Lummi, 2017; Corti et al., 2014), the identification of this data re-use ‘practice gap’ in the social sciences provides evidence of a considerable lag in willingness, readiness, and capacity to re-use qualitative data, with comparatively few researchers pursuing this approach. As Slavnic (2013) and others have stressed, there has been scant investment in the skilled preparation, documentation, curation, and archiving of qualitative social science data for re-use. So perhaps it should not be surprising that as a result, we do not yet have a culture of qualitative data re-use, particularly when we compare this scenario to the well-funded international and interdisciplinary infrastructures organised to support the exchange and re-use of biological and biomedical quantitative data (Leonelli, 2016).

In addition, the relevant methods literature tends to be dominated by extensive debates about feasibility and validity, with few critical methodological reflections from actual practitioners. Perhaps this is one of the reasons why Davidson and colleagues (2019) describe many social research peers asking, ‘Why would you want to do that?’ (p. 365). This article therefore adds to a small but emergent body of academic writing that has started to offer answers to that question, and many others. Using social aspects of Human Immunodeficiency Virus (HIV) in the UK as a case study, we outline the motivations driving our collaborative project on the collation and re-use of qualitative datasets. Furthermore, this work offers a response to Hammersley’s (2010) call to consider re-used data as being both given and constructed. Ultimately, we describe how and why a network of social scientists working in the HIV field explored the feasibility of re-using data from numerous datasets spanning 16 years and involving nearly 600 participants. We pay particular attention to the specific demands involved with working across projects and across institutions, and with sensitive data frequently collected from those in marginalised groups.

Why re-use qualitative data on HIV now?

More than 30 years into the epidemic, HIV continues to be scrutinised from a wide array of disciplinary perspectives, across its local and global socio-economic, political, and biomedical contexts. Our knowledge about HIV is inevitably contingent upon, and shaped by, biotechnological development, in addition to the range and variability of geographic locations where HIV is concentrated, the demographic sub-groups most affected in each of these locations, the extent to which those groups are strongly or weakly networked, and the impact of successive biotechnological developments which themselves are impacted by material and social change. In particular, we draw attention to some key moments in the epidemic, such as the introduction of HIV tests (1985); introduction of highly successful antiretroviral (ARV) treatment (1996); use of ARVs in the prevention of transmission to children (2000); and ARVs for adult prevention (2008 onwards) – both through the use of ARVs to render people with HIV un-infectious, as well as their prophylactic use among those who may be exposed to the virus. It is in response to this latest critical juncture in the biotechnological history of HIV that we initiated our data re-use project in 2015, during a time when the success and repurposing of HIV ARVs is promoted as the key to ending AIDS within a generation (UNAIDS, 2014). Those with a critical eye for the social science of medicine have rightly regarded such claims with caution (Kippax and Stephenson, 2016), given the unequal terrain of ARV access and informational bio-citizenship (Fassin, 2007; Rose, 2007). It is out of concern for the likelihood of particular groups and classes of people being left behind in this technological race for the ‘AIDS’ finish line, that our work seeks to engage directly with a considerable volume of social science evidence regarding people’s embodied experience of biomedical change, and the ways in which this experience has always been socially stratified, complex, uncertain, and slow-paced (Davis, 2010; Keogh and Dodds, 2015; Kippax and Stephenson, 2016; Mykhalovskiy et al., 2004; Nguyen, 2010; Paparini and Rhodes, 2016; Persson et al., 2016; Squire, 2013; Young et al., 2019). We want to find new ways to explore how these experiences have already unfolded alongside the deeply engrained moral attitudes about sex and HIV which have inscribed political, economic, personal, and medical responses to the HIV epidemic. We therefore believe it to be not only reasonable, but essential, to raise strategic questions with the support of qualitative data re-use, in order to critique a set of highly technological global and local HIV policy propositions featuring pharmaceuticals which promise rapid-fire change, while demonstrating insufficient regard both for the social context within which HIV is played out, and the bodies upon which these events continue to trace diverse biosocial histories.

Datasets collected as a part of qualitative HIV studies are often chronically underutilised and rarely re-examined, particularly where the drive for applied research has maintained an explicit focus on the design of treatment, prevention, and care services. This vast evidence base (like so many others) is generally underpinned by underlying epistemological presumptions that new understandings will be exclusively driven by the collection of new data. However, there are other ways to develop knowledge, including those which ask new questions of old data in order to purposively disrupt embedded epistemologies. This article offers an account of this disruptive journey into data re-use undertaken by UK social scientists working on HIV throughout a period of intense biotechnological change. We wanted to see if it was possible to assemble a large set of qualitative UK social science datasets for re-analysis, enabling a deeper exploration of the changing nature of engagements with ARVs over time. We felt this should enable the development and application of research questions that simply would not have been available to the researchers who were working in situ. We surmised that this process would help to develop rich insights, rather than collecting even more new data in the here-and-now.

From the outset, we had a shared interest in exploring how biomedicalisation of HIV was, and is, unfolding in the UK context (Clarke et al., 2003; Nguyen, 2010; Squire, 2013), specifically through the repurposing of ARVs for prevention and attentive to the ways in which these global claims and trends might contribute to broader understandings of the pharmaceuticalisation of public health (Bell and Figert, 2015). Therefore, we were also interested to explore these data for any anticipations of biomedicalisation and its effects, with a keen eye on the ‘history of the present’. Our intention has thus been not only to inform, but to help re-situate the HIV social science research agenda going forward, given the enormous strategic and economic attention being directed towards the exclusive use of ARVs to end the epidemic. As Mason (2007) points out, ‘some forms of interpretation are only possible from a distance’ (p. 3.2), and the global policy focus on ending the epidemic by 2030 with biotechnical solutions meant the time was right for us to return to the rich qualitative material collected from the frontlines of the HIV epidemic in the UK over the past two decades.

Project aims and procedures

Our aim was to assess the feasibility of data re-use through the merging, sharing, archiving, and pilot analysis of a considerable volume of qualitative data spanning the past two decades. This work was supported by the Wellcome Trust (grant 110452/Z/15/Z). The first two authors (CD and PK) undertook the majority of the work on the project with support from a research administrator, alongside periodic engagement and reflection from steering group members. As the project proceeded, a range of sizable demands emerged at different stages, including considerable time and effort required to locate and anonymise samples, and also to ensure all bureaucratic processes including the completion of data-sharing forms were undertaken in accordance with different universities’ procedures. Emergent practical issues as well as key themes were considered by the project steering group, and the lead authors kept detailed records of these developments in order to assess feasibility.

Methodological sources of inspiration

As a network of social scientists working on HIV in the UK, our starting point on this project was our shared knowledge of, and immediate access to, a considerable volume of unarchived qualitative data. As novices in qualitative data re-use, we turned to the methodological literature for guidance. It was immediately evident that this literature was filled with debate about whether data re-use is feasible or useful (see Slavnic, 2013 for an overview). Rather than becoming enmeshed in these debates, we found a good ideological fit with the work of Mason (2007), Hammersley (2010), and Tarrant (2016). These researchers speak of qualitative ‘data re-use’ to refer to a wide set of practices that involve a return to or a repurposing of data – sometimes in isolation, but more frequently in conversation across datasets. Their work variously presents qualitative data re-use both discovering and applying new research questions that have emerged subsequent to the original period of fieldwork and analysis. This approach enables the exploration of elements that were originally ignored, invisible, or not yet in existence, and it can support researchers in devising new strategic directions for their research. While Heaton (2004, 2008) and others refer to this type of work as ‘secondary analysis’, this term can infer that there is a singular type of analytical practice that is attached to qualitative data re-use, which is not at all the case. Instead, as Davidson and colleagues (2019) have recently asserted, the selected format of final in-depth analysis is generally ‘of the type that is familiar to most qualitative researchers’ (p. 363), while others have pointed out that there is a long history of social scientists returning to their own data (Moore, 2007).

We found particular inspiration in Mason’s (2007) championing of qualitative data re-use as a part of what she calls an ‘investigative epistemology’; characterised by creativity, purpose, and energy, while still being underpinned by critical reflectiveness. Furthermore, she points out that when we start to acknowledge that the politics of reflexivity and interpretation is complex by its very nature, then it can be quite useful to consider data both from near and far perspectives – both embedded within and extracted from their original contexts – as a means of undertaking more fruitful investigations (Mason, 2007: 3.3). She reminds us about the value that curiosity and freedom can afford to our research efforts, and that data re-use, while not without its challenges, offers just such an opportunity for the expansion of knowledge. Thus, researchers are prompted to consider the potential benefits of temporal and/or biographical distance from the processes of production, which can help re-confirm the validity of the original researchers’ findings (Haynes and Jones, 2012). We can bring new reflexivity, critical distance, and rigour to qualitative data re-use in ways that can facilitate fresh insights, and which can serve to strengthen or challenge existing arguments.

Our team set out to establish the feasibility of re-using a considerable expanse of data: an endeavour somewhat akin to the scope and ambition of the ESRC Timescapes project (Irwin et al., 2012). As such, it has been essential to expand our traditional ways of working, acting on Moore’s (2007) injunction to breach boundaries. When doing so, Moore (2007) stresses that ‘eschewing our comfort zones, and developing a more creative, and even messy, approach may be the key to opening up the full potential of qualitative data reuse’ (p. 4.8). This is an approach which felt suited to our project, given that we were assessing how to apply completely novel research questions about the steady emergence of novel applications of ARVs for prevention to data collected before such uses of ARVs were fully conceived, evidenced, and recommended. In embracing Moore’s call for creative and energised approaches to qualitative data re-use, we were hopeful that these artefacts from the past (Zimbra et al., 2010) might help us to better understand our shared present.

It was ultimately Hammersley’s (2010) thoughtful reflection on the nature of data and of its re-use that helped us to sit more comfortably with the complex task we had set ourselves:

. . . these two meanings of ‘data’, as constructed and as given, are both essential: they relate to different but equally important aspects of the material we use as grounds for inference in research . . . we collect data as a resource and then use some of it, in particular ways, as evidence, in order to draw inferences relevant to our research focus; and we discover how to do this in the course of our work. So, in using data, we necessarily reconstitute it as evidence. (p. 4.6, our emphasis)

Hammersley goes on to describe how, in every qualitative social research project, researchers variously codify some materials as data – depending on the extent to which they conform with our research questions. The status of these materials will vary throughout the changing course of the project, alongside the shifts in research questions that frequently occur. Therefore, we organise, compare, select, and re-form these data to generate evidence, and we are ‘reforming something that already exists, not making it up’ (Hammersley, 2010: 4.7). This account chimes closely with the experiences described by researchers in our network, while each of us collected, curated, and used the data originally, and also while considering the feasibility of data re-use.

Parallel reflections have also been made by Leonelli (2016) about the essentially human task of packaging and mobilising data in order to enable it to ‘travel’ on what she refers to as ‘data journeys’ across time, space, and research teams. In her philosophical reflections on these practices as they relate to the construction of online quantitative data-sharing databases in the biological sciences, we can see considerable synchronicity with the thoughts that Hammersley offers up to social scientists regarding data re-use. In entirely distinct research spheres, Hammersley and Leonelli implore us to maintain awareness of the inevitable social situatedness of our practices of data collection, preparation, curation, and re-use (across varied personal, institutional, disciplinary, political, and economic terrains) – and to use these reflections to enrich our explorations. Of particular salience is Leonelli’s discomfort with the metaphoric use of the term ‘data flows’ – as applied in data-centric scientific endeavours. As she states,

Not only do data not ‘flow’ toward discovery, but it is the lack of smoothness and pre-defined direction that makes their travel epistemologically interesting and useful. (Leonelli, 2016: 41)

Similar to Hammersley, then, Leonelli exhorts us to see data not simply as inert objects that hold meaning independently, but instead to also see their capacity to collect and build evidential value through the very act of mobilisation on a data journey which is influenced by every person, system, and contextual situation that has touched those data along the way.

Hammersley’s and Leonelli’s contributions therefore informed our thinking considerably, as they helped to enliven the possibilities of what qualitative data re-use enables. Their work encourages us to anticipate epistemological divergence in the ways that researchers engage with data as we use what is given and also as we create something anew. At times it was easy to feel like we might be trying to engage with an unmanageable volume of ‘given’ material, contributing to a sense of alienation and disconnection. And yet, we also remained attuned to Hammersley’s reassurance that there are always periods within qualitative analysis when there is a feeling of unsteadiness, whether or not we are re-using data. These considerations endowed us with a useful balance, even when we were feeling slightly at sea in a mass of data, with a host of emergent questions.

The practicalities of getting started

At the outset, we assembled eight UK social scientists who had worked in the UK across the past two decades, leading a wide variety of qualitative projects focussed on the social, policy, and behavioural aspects of HIV in different parts of the country. All were interested in exploring the feasibility of data re-use, so we identified projects between 1997 and 2013 led by members of this collaboration which were most likely to offer insights into the emergence and development of HIV technologies following the introduction of ARVs. Most of the studies utilised semi-structured or narrative interviews with individuals, though some used focus groups, or a blend of interviews and focus groups, and one employed structured questionnaires via telephone.

Each researcher took responsibility for checking the quality and coherence of their materials, in order to assess what format they were held in and to consider how to make changes to file formats if necessary; and also to identify any files that were missing, corrupted, inaccessible, or incomplete. All of this activity was tracked so that the group could maintain a record of the status of potential data materials. Quite quickly, we were able to identify that we might potentially work with data emerging from 741 diverse participants, with just over half this sample being gay, bisexual, or men who have sex with men of a range of ethnicities and HIV statuses, and a fifth of the sample being Black African people with HIV from a range of sexualities (together these groups account for the majority of HIV infections in the UK). The remainder included those working and providing care in the HIV sector, and others with and without diagnosed HIV. Study topic guides, questionnaires, outputs, and basic participant demographics were collated and shared, as well as contextual details regarding study design and data collection. One outcome of this stage of appraisal was the realisation that the original material from one of the candidate datasets had been destroyed, and so would not be available for re-use.

VIGNETTE #1

We needed to share data across four (current) UK institutions of higher education, and in addition, many of the datasets were collected during earlier employment at different universities. This raised a curious set of questions about institutional ownership of data and completion of data-sharing proforma when the data in question had travelled with a researcher. Our discussions about institutional attitudes towards the data that travel with us when we start new jobs were never fully resolved. We recommend that other researchers initiating formalised data-sharing discussions (which are frequently framed as simply being institution-to-institution) will want to take time to consider in detail how institutional attitudes towards such data are likely to impact on ‘sharing’ procedures.

We devised a comprehensive data management plan to support the completion of data-sharing agreements (Vignette #1). The plan articulated our core principles of data governance and curation. Most helpfully, focusing on these matters at an early stage led us on to the creation and testing of a tailor-made anonymisation protocol (discussed in further detail below). We emphasise these somewhat day-to-day details here, because any team who is considering simultaneous data-sharing, archival depositing, and data re-use across multiple projects that span diverse institutions need to be apprised of the considerable amount of administrative labour that is required to prepare the data for re-use. As our team discovered at all stages of this work, we routinely underestimated the required time needed for the more mundane aspects of data assemblage and collaborator communications.

The anonymisation protocol

We were strongly committed to minimising the potential for harm to come to these historical research participants as a result of our archiving and re-use of these data. In reviewing the quality and format of the datasets, the researchers recognised that although some of the data had been considered to already be fully or partially anonymised, approaches to anonymisation had varied dramatically both between and within datasets. For this reason, it was necessary to develop a uniform anonymisation protocol to be applied to all data being considered (see supplementary file).

The development of this protocol served a range of unanticipated purposes. First, each member of the steering group was compelled to reflect on what was meant by the notion of ‘anonymisation’ in concrete terms, and the extent to which such practices were feasible prior to sharing data within the group, and beyond it, via data archiving. It also helped to make some of the challenges of data-sharing immediately apparent, as set out in Vignette #2.

VIGNETTE #2

Following a delay in sharing two datasets from one source, it was during a phone call that a member of the network explained they had been reflecting for a while and had decided not to share their data (and not to work towards deposit with the UK Data Archive (UKDA)) due to concerns about the risks of participants’ identities being unwittingly disclosed. This decision was respected by the remaining network members, and serves as an important learning point, as such endeavours may not be deemed appropriate by all colleagues, for all datasets, or for all research participants. Thus, sometimes a point is reached in a data re-use project when it becomes clear that either some or all of the data will not be made available for this purpose. We still had a sizable sample to work with, consisting of 12 datasets, with 589 individual participants.

Following advice from the UKDA, our approach to anonymisation required flexible and tailored solutions, enabling each study to be considered within its own right. The protocol stipulated that all direct identifiers (peoples’ names, addresses etc.) would be removed and replaced with standardised text in order to indicate that a specific type of identifier had been removed (i.e. (name of friend)). When it came to indirect identifiers (such as geographic location and/or place of employment), we resolved to remove as little information as possible, because doing so might strip out important contextual details for our own and others’ future analysis. The protocol gives examples where it would be necessary to remove a series of indirect identifiers when it was judged that in combination, they could potentially identify an individual, as well as instances where an entire transcript may not be deposited or shared in the same way as other materials in the set, due to risk of identification from detailed narratives that are embedded across the document.

We devised procedures that paid close attention to the following three issues: consent, de-identification practices (anonymising), and regulation of access to data. We were encouraged by the UKDA to think of the way these three issues should be maintained in mutual balance and suspension (Steering Group meeting notes, 7 June 2016), so that where one element was low for a particular project, the others could be strengthened. For instance, we were working with a number of historic projects with consenting procedures that did not mention future re-use. As a direct result, for those projects, we decided to considerably heighten our de-identification practices (if needed, removing all possible indirect identifiers), placing restrictions on future access, and at times removing an entire transcript from the dataset if the overall narrative ran a particularly high risk of identification.

Once the drafted protocol was discussed and agreed by the seven collaborating researchers, we randomly selected three data encounters from each of the 12 datasets, and each researcher applied the principles of anonymisation set out in the protocol to this sample of our own work, while keeping a separate record of changes. This enabled us to test and strengthen the protocol, and it also helped all co-authors to recognise that this flexible and context-driven approach to anonymising was a complex, time-consuming, and skilled process. It also became clear that the process of anonymisation required contextual knowledge and experience which was gained either through: proximity to the original research project; and/or familiarity with the social contexts and experiences of particular sub-groups involved in these studies (Vignette #3). Such insights supported the complex decisions that those undertaking anonymisation needed to reach about assortments of highly contexualised and interrelated information that could potentially identify an individual.

VIGNETTE #3

While applying and testing the anonymisation protocol, one member of the network experienced an unexpected and sustained emotional response to their randomly selected transcripts. This researcher’s personal connections with the interviewers and the research participants had come flooding back. Some had died of AIDS not long afterwards, while others who had spoken of being very close to the end of their lives had not died, and are still playing key roles in our HIV research and policy infrastructure.

Although it had not been anticipated, the development and use of this anonymising protocol was a hinge point for the work of this pilot feasibility project. Creating, revising, and applying the three-page tool elicited the reflection that was needed to consolidate collaborators’ shared norms and ethos, enabling the project to proceed. It also (again) forced the research team to concede that the task we had taken on was considerably more involved and laborious than we had anticipated.

Analytical procedures

The testing of the anonymisation tool on three randomly selected transcripts from each dataset served a further purpose: it meant that the team now had a randomly selected sample of 36 transcripts/notes from across 12 different datasets that we could start to work with for our pilot analysis. Given the scope and scale of the available data for this feasibility study, this smaller sample that was originally compiled to test our anonymisation protocol now presented itself as the most obvious material on which to perform our Stage 1 analysis to determine whether data re-use which incorporated so many diverse sources was likely to result in meaningful outcomes. The two lead authors read and became familiar with this smaller sample, alongside a set of metadata, including the original project descriptions, outputs, and question guides.

Using NVivo 10, we then coded material that related to HIV ARVs in any way (both prompted and unprompted), making note of any sections where we may have expected such discussion to arise, yet it did not, given that it is important to listen for silences when re-using data (Irwin et al., 2012). This enabled us to identify for the first time, which datasets might be most fruitful based on how the topic of ARVs was (or was not) situated within these different data sources. Given that this pilot coding elicited a considerable volume of material to work with in its own right, we decided to remain focussed on this pilot sample of 36 data encounters.

For the Stage 2 level of deeper analytical exploration, we used a framework approach (Ritchie and Lewis, 2003), focusing upon discourses of treatment literacy and engagement with/or rejection of expert knowledges related to ARVs. This approach enabled us to apply our research questions to varied materials, collected at different points in time, in different places, and with the involvement of diverse participants and researchers (Irwin et al., 2012). We used the themes identified at Stage 2 to generate a more focused inquiry for Stage 3, which will be reported in greater detail in subsequent publications.

Colleagues working on Timescapes have recently described using similar approaches in their re-use of qualitative datasets including around 700 text files, using CAQDAS in order to undertake word and proximity searching based on keywords (Davidson et al., 2019). In our project, no computerised searching of this type was required, but could have been considered had we selected a larger sub-sample. However, on a range of other analytical choices, there is considerable parity, including a series of processes that Davidson and colleagues (2019) refer to as:

Undertaking an overview survey and constructing a corpus;

Recursive surface ‘thematic’ mapping;

Preliminary analysis;

In-depth interpretive analysis.

In particular, we have found their sustained use of archaeological metaphors to describe the work of qualitative data re-use to be particularly helpful, and far more useful than the concept of ‘mining’, to communicate analytic approaches to re-using multiple qualitative datasets. This is because the former ‘evoke[s] the idea of moving between breadth and depth while retaining the integrity of a contextualised and detailed qualitative approach’ (Davidson et al., 2019: 369). The notion of ‘data mining’ may have relevance when re-using quantitative data that is potentially quite standardised, with a high degree of coherence between datasets. However in contrast, we have found that archaeological metaphor encourages us to generate practices that are based on locating diverse forms of data in situ, and on utilising this context as a strength for theory-building, rather than as a weakness. At a purely pragmatic level, due to the scale and the scope of the task that we had set ourselves, using ‘test-pits’ as mechanisms to undertake surface mapping was highly beneficial, as it fitted in well around multiple research obligations, allowing time for reflection, discussion with other network members, and subsequent returns for further thematic analysis (Vignette #4).

VIGNETTE #4

One of the studies was completed in 1998, and as researchers, we broadly recalled that period immediately following the roll out of ARVs as one of heady optimism, of ‘Lazarus-type’ recoveries, and emptying hospital wards. Yet what stood out for us as we re-engaged with these data were experiences of the new treatments that were often painful, long-term, uncertain, and dangerous. Some participants spoke of their struggles to stay on the treatments, of side effects, and crippling guilt and anxiety about missing doses. We heard from people who could not believe what they were reading in the press about the transformative moment they were supposed to be at the heart of, because it did not reflect their own embodied treatment experiences. Our team reflected upon how the extent to which we, as researchers active throughout this period, had forgotten what these data portray.

Steering group meetings

Central to the drive and focus of this project was the support of its Steering Group comprising all original principal investigators on the projects being considered for re-use, a public health historian, and a policy expert from a national HIV thinktank. The first two Steering Group meetings helped to meet a range of bureaucratic and strategic functions, while also enabling consolidation and agreement of study procedures (including the Data Management Plan and the Anonymisation Protocol). In the second of these meetings, two members of UKDA staff joined us to discuss key principles related to archiving and to answer questions about the practicalities and ethics of making deposits into national data repositories. Therefore, a considerable amount of group/individual labour was focussed on process and procedure, before the work of analysis could begin in earnest.

Our final Steering Group meeting focussed entirely on the emergent findings that the lead researchers had revealed during Stage 2 analysis. At this meeting, the group tested and explored these themes and considered next steps. We structured that meeting along the lines of what Tarrant (2016) has described as a ‘data sharing workshop’ to explore common themes between projects by bringing the data, the researchers involved in its original production, and its re-users into conversation. We utilised the following key questions set by Franz (2013) to elicit colleagues’ iterative responses to the emergent themes and patterns:

What surprised you about the data?

What was confirmed by the data that you already knew?

What was missing in the data that you thought you would see?

What other meanings do you see in the data that we haven’t already discussed?

What other comments do you have about the data?

This structure enabled us to discuss any overlaps, silences, surprises, recollections, and differences, as well as exploring the social, political, and emotional contexts of data production in much greater detail. As a group, we discussed the challenges involved in returning to this material – including the challenge of ‘hindsight’ as a researcher, and being somewhat overwhelmed by just how much we had allowed ourselves to ‘forget’ – which many agreed was a job requirement, but also tended to come as a surprise when confronted with transcripts of our own interviews about which we could recollect nothing at all. We also discussed the emotional labour involved in returning to difficult data, and when making personal reflections about working in this field across the decades. Given the richness that was uncovered through our initial process of analysis, the possible directions that re-use could take us in could have overwhelmed us, but we found instead that this meeting was pivotal in helping to centre our thinking, contextualise the research, and consider what was both feasible and desirable in terms of how we might work with these data moving forward, both in the latter stages of analysis, as described above, but also beyond the life of this project. What had started as a feasibility project was very quickly opening up a considerable number of potential lines of enquiry, each of which were intensively considered and critiqued by members of the Steering Group. Not only had the group helped to confirm the feasibility of this approach through our culminating discussion in the data-sharing workshop, it was also very clear that it was bound to bear fruit during a time that was characterised by uncertainty and considerable ambivalence among many social scientists working in our field of study. Ultimately, it was not possible to ensure that all data from all 12 studies could be fully anonymised and deposited into the UKDA within the lifetime of this project, as had been originally hoped. However, Steering Group members reaffirmed their commitment to identifying the resources to enable anonymisation and archiving of these data (and more) over time, due to their burgeoning acknowledgement of the potential that they held for re-use, not only by members of this collaboration, but also for others.

Discussion

In preparing this article, we hoped to convey a number of the practical lessons that were learned throughout this feasibility study, as well as providing broader methodological reflections on the benefits of qualitative data re-use across a diverse range of source materials. A considerable strength of this approach has been the reflexivity it offers the practitioner in relation to aspects of knowledge production that are present in all forms of social research:

It is true that data are constructed by researchers. What is and is not relevant evidence, what it means, what should be taken as strong and what as weak evidence, and so on, depends upon the research focus, the data available, and the line of argument that develops in the course of the research. However, if we think of data as resources that we subsequently come to use as evidence, then there is also an important sense in which data exist independently of the research project. (Hammersley, 2010: 5.2)

While revisiting research texts (research notes, outputs, and transcripts) which capture the researchers’ original descriptions of what was there, we continually found ourselves more enlightened about what is here, now, and what may yet come to pass. As Moore (2007) suggested, we found that ‘through recontextualization, the order of the data has been transformed’ (p. 2.3). At the same time, one of the potential limitations was our selectivity in tracing the historical and contemporary narratives of ARV repurposing, directing us towards the identification of particular projects and sections of transcripts that we examined with great care. In addition, we did not (and could not) include all qualitative datasets on HIV that were collected in the UK in this period, and as detailed above, some in our group decided to opt out from sharing their data. As such, we acknowledge that this approach involves a range of highly selective processes which are acutely dependent upon the curation of specific data sources. It is our view that this selectivity is ultimately outweighed by the benefits generated through re-use.

Where those undertaking analysis had not been present during data construction (or if it happened some time ago), we needed to be considerably mindful about any assumptions (or gaps in memory) that could obscure the social context of data production. As others have noted (Haynes and Jones, 2012), we too came to the conclusion that our temporal and attitudinal distances from the data we were re-using ultimately demonstrated one of the strengths of this approach, because it encouraged us to interrogate and re-examine assumptions we may have held about what the data ‘should’ contain, or how original researchers’ questions ‘should’ have been framed. Some data had emerged from the shadows in ways that had taken us off guard and reminded us of what it was to work in different moments and places of an unfolding epidemic, offering snapshots of both mundane and challenging aspects of life in close proximity to this disease. We also found that this process brought data and researchers together across time and place in ways that enabled us to hear silences, which in turn had the potential to open up their own interpretive avenues (Irwin et al., 2012). We also became alert to the ‘given’ aspects of data from the past which cannot be expanded or extended. Quite unlike researchers who may take a grounded theory approach, through iterative analysis, which can lead to the immediate amending of topic guides or the expansion of a sampling frame, when re-using data, we simply need to acknowledge that such existing gaps can only be noted with interest, which can at times place limits on the bounds of exploration.

Ultimately, the materials we prepared for re-use contained considerable internal vitality and coherence, while at the same time, they were also reconstituted, perhaps ‘re-coloured’ under our contemporary spotlights, with new questions helping to uncover new hues, depths, and variations when regarded within the context of changing imperatives of ARV use in the HIV global landscape. In turn, some of the most productive moments in this process were when data fragments functioned like mirrors that offered flashes of insight into the future. It was at these moments in particular, that a flurry of further questions tended to emerge, and it is this characteristic of qualitative data re-use that we find the most exciting and generative.

Our foray in qualitative data re-use also held its own highly personalised and sometimes quite emotional surprises. Certainly the extent to which this was experienced depends on the duration and depth of individual researchers’ involvement in the field, and the shape of their personal/community/activist ties within the HIV sector. As such, some members of our collaboration were themselves taken aback when this process triggered a series of recollections about undertaking data collection right at the moment when effective antiretroviral medications had proved to be efficacious, but not soon enough to save colleagues, partners, co-researchers, and research participants who had already died. It also became clear from reviewing the data that some experiences had become transplanted by the grand-narratives that succeeded them. For instance, while many of us who had been working in the field since the late 1990s had a tendency to reflect back on the emergence of ARVs as bringing about a sudden ‘resurrection’ impact on people with HIV and AIDS at that time, the data themselves revealed a different and much more mundane truth than some of us recalled. In fact, we could see through the transcripts from that period that frequently, people’s recovery journeys on new treatments were often mediated, painful, long-term, uncertain, and dangerous. It was therefore a striking realisation that the post hoc cultural narratives about the sudden change brought about by a pharmaceutical revolution in HIV had served to obscure our own personal and professional experiences of the period, and that despite remaining embedded in the HIV sector, many of us had glossed over just how complex life with HIV continued/s to be after 1996, and how diverse these experiences were depending on the material and geographic locations of people living with HIV during this period. What these data served to remind us of are the many messy, halting, ambivalent experiences and accounts of illness and of living with HIV. They also remind us that it is often less a question of responding to, adhering to, or even accepting treatment advances or pharmaceutical prevention technologies, but more a question of growing with and around them as they slowly emerge, a theme that has been singularly emphasised in Squire’s (2013) work.

It is this realisation that enables us to consider the power of data re-use within the contemporary context where these same pharmaceuticals are being positioned as the mechanism for eradication of HIV in the next decade. For instance, this process helped us to powerfully re-acknowledge that often, the experience of technological change is slow, rather than fast, and this is even more true for those who live on the margins, as is the case for so many people living with and at risk of acquiring HIV. Re-use of these particular data has helped to re-ground members of our collaborative group in experiences of adaptation to ARVs that happen over time, along with the accommodation of new realities, which encourages us to question the temptation to look back, or indeed forward, and impose a sense of ‘suddenness’ on developments that have been and will be experienced by individuals through an often slower pace of change that is entirely reliant upon social, political, and economic context. We anticipate that similar re-discoveries will be likely for those who re-use complex and long-range sets of qualitative data covering other fields of social enquiry.

In practical terms, this project helped us to clarify and reaffirm a range of strong rationales for data re-use. First, it helps to avoid the systematic over-researching of small and potentially vulnerable groups of people. The findings above also demonstrate that qualitative data re-use helps to make better use of scant resources in environments where financial constraints have led to considerably increased competition for funded empirical data collection. Although anonymisation procedures, data analysis, and write up still demand resource, it consumes far less time, energy, and expenditure than the collection of new data. The qualitative data re-use practice gap identified by Bishop and Kuula-Lummi (2017) is one that needs to be remedied, given that research funding councils in the UK and elsewhere have started to request that a case is made for the collection of new empirical materials rather than re-using existing data – encouraging re-use to be increasingly regarded as a sustainable and efficient practice when so much existing data are underutilised. At the same time, the development of these infrastructures requires diverse engagement across the social science spectrum in order to encourage broad critical input on the ethical issues that such developments will inevitably raise. As Leonelli (2016) warns, the Open Data movement is deeply embedded in the logics and structures of globalised market imperatives. Therefore, inasmuch as data re-use may be a mechanism to support sustainability, we must all be vigilant about power dynamics in order to ensure that data infrastructures are not simply left to the dictates of powerful market forces.

Conclusion

Despite a flourishing literature about the re-use of qualitative data, social scientists lack a wide array of sources that offer insights into the practicalities of preparing data for re-use, and describing the work that is entailed in discovering resonances and dissonances among considerably diverse and contrasting datasets. In sharing our methodological inspirations, rationales, and day-to-day practices, we seek to inspire other qualitative researchers to consider the value of bringing contrasting bodies of data ‘into conversation’ (Tarrant, 2016). When working with datasets that have not yet been readied for sharing, researchers will do well to ensure the allocation of considerable resource for anonymisation; indeed, a key learning from this project has been to ensure that in future we will plan ahead for anonymisation and archiving before a study is put to rest. Ultimately, within our collaboration, we found that the generation of the anonymisation protocol alerted all potential data-sharers to the realities of Open Data in a qualitative social science context, and this too was a key moment for learning. The use of ‘test-pits’ and ‘deeper digs’ within context were extremely fruitful endeavours that enabled the clear identification of emergent themes, silences, and forgotten experiences (Davidson et al., 2019). It is clear that in undertaking this feasibility study, we have only just scratched the surface of what is possible with these 12 datasets, and in time the goal is for them to all be made fully available for further research through the UKDA.

In the meantime, we have established that drawing together discrete sets of data, collected by different teams, for different reasons, and at contrasting temporal and geographic points is a feasible and considerably rewarding undertaking. It was an ambitious and rare undertaking which has not only had tangible results in terms of theory-building and methodological development, but it has also considerably affected the researchers involved – encouraging us to be tuned in to the history of the present and the future in ways that would not have been possible had we not embarked on this project. Furthermore, we believe that qualitative data re-use strategies and processes will go on to be further developed, refined, and built upon by social scientists and interdisciplinary researchers. In particular, we hope to see those working in diverse health fields that are characterised by technological change consider qualitative data re-use more often as a means of exploring processes of biomedicalisation. In drawing upon Leonelli’s (2016) work on biological ‘data journeys’, we also hope that this article opens up space for further exchange between those working on data re-use from a wide range of starting points, including those interested in the philosophy of science, biologists, medics, medical ethicists, science and technology scholars as well as the full array of social scientists.

Supplemental Material

dodds_supplementary_file – Supplemental material for The Long and Winding Road: Archiving and Re-Using Qualitative Data from 12 Research Projects Spanning 16 Years

Supplemental material, dodds_supplementary_file for The Long and Winding Road: Archiving and Re-Using Qualitative Data from 12 Research Projects Spanning 16 Years by Catherine Dodds, Peter Keogh, Adam Bourne, Lisa McDaid, Corinne Squire, Peter Weatherburn and Ingrid Young in Sociological Research Online

Footnotes

Acknowledgements

We owe considerable thanks to the following individuals for their support of this project as advisors and as members of the project steering group: Virginia Berridge, Libby Bishop, Louise Corti, Paul Flowers, Sarah Ratcliffe. We also thank our anonymous reviewers for offering insights into improving the paper.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: CD was the lead researcher on this study, supported by Wellcome Trust (grant 110452/Z/15/Z). The time the work was initiated, she was based with Sigma Research at the London School of Hygiene and Tropical Medicine. PK co-led this study with CD while based at Greenwich University. AB was based at Sigma Research at the London School of Hygiene and Tropical Medicine at the time this study was undertaken. LMcD is funded by the UK Medical Research Council (MRC) and Scottish Government Chief Scientist Office (CSO) at the MRC/CSO Social & Public Health Sciences Unit, University of Glasgow (MC_UU_12017/11, SPHSU11). The HIV & the Biomedical study was funded by the MRC/CSO (MC_UU_12017/2, MC_UU_12017/11, SPHSU11). Data contributed to this study by CS was funded by the Nuffield Foundation and the University of East London. Datasets from Sigma Research that were contributed to the study by PW arose from research projects funded by: Department of Health (predominantly via the CHAPS and NAHIP programmes), Monument Trust, National AIDS Trust and Nottingham City Council. IY was supported by Scottish Chief Scientist Office Postdoctoral Fellowship from 2014 to 2017 (PDF/14/02; CF/CSO/02). IY is currently a member of the Centre for Biomedicine, Self and Society, supported by Wellcome Trust (209519/Z/17/Z). The HIV & the Biomedical Study was supported by the UK Medical Research Council (MRC) (MC_U130031238/MC_UU_12017/2).

ORCID iD

Catherine Dodds

Supplemental material

Supplemental material for this article is available online.

Author biographies

Catherine Dodds is a senior lecturer in Public Policy in Bristol’s School for Policy Studies. She is currently interested in work that undertakes a critique of the impact of the biomedicalisation of HIV on policy development and lived experience for those closest to the epidemic.

Peter Keogh is the deputy associate dean for Research Excellence and Senior Lecturer in the Faculty of Well-being, Education and Language Studies at the Open University. He is also the lead for the Reproduction, Sexualities and Sexual Health Research Group. His research explores how people with HIV and those at risk for HIV manage their sexual, intimate and social lives as the epidemic unfolds.

Adam Bourne is associate professor of Public Health and Deputy Director of the Australian Research Centre in Sex, Health & Society (ARCSHS). He takes a leading role in the development of research that examines the health and well-being of LGBTIQ populations, at both a domestic level and in an international context.

Lisa McDaid is professor of Social Sciences and Health at the University of Queensland and leads on health research at the Institute for Social Science Research. Her research aims to improve health and wellbeing, particularly among the most disadvantaged in our society. Lisa is interested in how best to engage communities at high risk of poor health and wellbeing in health improvement research and in developing new methods of co-production for intervention development.

Corinne Squire is professor of Social Sciences and Co-Director, Centre for Narrative Research, at the University of East London, and Research Associate, University of the Witwatersrand. Her research interests are in subjectivities and popular culture, narrative theory and methods, HIV and citizenship, and refugee politics.

Peter Weatherburn has been the director of Sigma Research since 1997 and is also Head of the Sexual & Reproductive Health Group in the department of Public Health, Environments and Society (PHES/ PHP) at London School of Hygiene and Tropical Medicine. In addition to sexual and reproductive health Peter is interested in development and evaluation of community and structural interventions that address health inequalities especially in relation to mental health, drug use and youth violence.

Ingrid Young is a chancellor’s fellow in Social Science of Health & Medicine at Edinburgh University, with training in Sociology and History. Her research examines the interface between biomedicine and public health, with particular consideration of gender, sexuality and wider inequalities in relation to emerging biotechnologies in HIV and sexual health.

References

Bell

Figert

(2015) Moving sideways and forging ahead: Reimagining ‘-izations’ in the twenty-first century. In: Bell

Figert

(eds) Reimagining (Bio)Medicalization, Pharmaceuticals and Genetics: Old Critiques and New Engagements. London: Routledge, pp. 19–40.

Bishop

Kuula-Lummi

(2017) Revisiting qualitative data reuse: A decade on. SAGE Open 7(1): 1–15.

Clarke

Shim

Mamo

, et al. (2003) Biomedicalization: Technoscientific transformations of health, illness and US biomedicine. American Sociological Review 68: 161–194.

Corti

Van den Eynden

Bishop

, et al. (2014) Managing and Sharing Research Data: A Guide to Good Practice. London: SAGE.

Davidson

Edwards

Jamieson

, et al. (2019) Big data, qualitative style: A breadth-and-depth method for working with large amounts of secondary qualitative data. Quality & Quantity 53: 363–376.

Davis

(2010) Antiretroviral treatment and HIV prevention: Perspectives from qualitative research with gay men with HIV in the UK. In: Davis

Squire

(eds) HIV Treatment and Prevention Technologies in International Perspective. Basingstoke: Palgrave Macmillan, pp. 126–143.

Fassin

(2007) When Bodies Remember: Experience and Politics of AIDS in South Africa. Berkeley: University of California Press.

Franz

(2013) The data party: Involving stakeholders in meaningful data analysis. Journal of Extension 51: 1IAW2.

Hammersley

(2010) Can we re-use qualitative data via secondary analysis? Note on some terminological and substantive issues. Sociological Research Online 15(1): 1–7. Available at: http://www.socresonline.org.uk/15/1/5.html

10.

Haynes

Jones

(2012) A tale of two analyses: The use of archived qualitative data. Sociological Research Online 17(2): 1–9.

11.

Heaton

(2004) Reworking Qualitative Data: The Possibility of Secondary Analysis. London: SAGE.

12.

Heaton

(2008) Secondary analysis of qualitative data: An overview. Historical Social Research 33(3): 33–45.

13.

Irwin

Bornat

Winterton

(2012) Timescapes secondary analysis: Comparison, context and working across data sets. Qualitative Research 12(1): 66–80.

14.

Keogh

Dodds

(2015) Pharmaceutical HIV prevention technologies in the UK: Six domains for social science research. AIDS Care 27: 796–803.

15.

Kippax

Stephenson

(2016) Socialising the Biomedical Turn in HIV Prevention. Cambridge: Anthem Press.

16.

Leonelli

(2016) Data-Centric Biology: A Philosophical Study. Chicago, IL: University of Chicago Press.

17.

Mason

(2007) ‘Re-using’ qualitative data: On the merits of an investigative epistemology. Sociological Research Online 12(3): 1–4.

18.

Moore

(2007) Re-using qualitative data? Sociological Research Online 12(3): 1–13.

19.

Mykhalovskiy

McCoy

Bresalier

(2004) Compliance/adherence, HIV and the critique of medical power. Social Theory and Health 2(4): 315–340.

20.

Nguyen

(2010) The Republic of Therapy: Triage and Sovereignty in West Africa’s Time of AIDS. Durham, NC; London: Duke University Press.

21.

Paparini

Rhodes

(2016) The biopolitics of engagement and the HIV cascade of care: A synthesis of the literature on patient citizenship and antiretroviral therapy. Critical Public Health 26(5): 501–517.

22.

Persson

Newman

Mao

, et al. (2016) On the margins of pharmaceutical citizenship: Not taking HIV medication in the ‘treatment revolution’ era. Medical Anthropology Quarterly 30: 359–377.

23.

Ritchie

Lewis

(2003) Qualitative Research Practice: A Guide for Social Science Students and Researchers. London: SAGE.

24.

Rose

(2007) The Politics of Life Itself: Biomedicine, Power and Subjectivity in the Twenty-First Century. Princeton, NJ: Princeton University Press.

25.

Slavnic

(2013) Towards qualitative data preservation and re-use – Policy trends and academic controversies in UK and Sweden. Forum Qualitative Sozialforschung/Forum: Qualitative Social Research 14(2): 10.

26.

Squire

(2013) Living with HIV and ARVs: Three-Letter Lives. Basingstoke: Palgrave Macmillan.

27.

Tarrant

(2016) Qualitative secondary analysis and research design: Reflections on a methodological framework for data re-use. Timescapes Working Paper, University of Leeds, Leeds. Available at: https://cpb-eu-w2.wpmucdn.com/blogs.lincoln.ac.uk/dist/e/6201/files/2016/11/Working-Paper-QSA.pdf

28.

UNAIDS (2014) Fast track: Ending the AIDS epidemic by 2030. Report, UNAIDS, Geneva. Available at: http://www.unaids.org/en/resources/documents/2014/JC2686_WAD2014report

29.

Walters

(2009) Qualitative archiving: Engaging with epistemological misgivings. Australian Journal of Social Issues 44(3): 309–320.

30.

Young

Davis

Flowers

, et al. (2019) Navigating HIV citizenship: Identities, risks and biological citizenship in the treatment as prevention era. Health, Risk & Society 21: 1–16.

31.

Zimbra

Abbasi

Chen

(2010) A cyber-archaeology approach to social movement research: Framework and case study. Journal of Computer Mediated Communication 16(1): 48–70.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.65 MB