Abstract
In this article I argue that data-sharing risks becoming epistemically extractivist and is a practice sensitive to Linda Martín Alcoff´s challenges for extractivist epistemologies. I situate data-sharing as a socio-epistemic practice that gives rise to ethical and epistemic challenges. I draw on the findings of an institutional ethnography of an international social science research project to identify several ethical and epistemic concerns, including epistemic extractivism. I identify Alcoff’s first and second challenge for extractivist epistemologies in the findings of the empirical investigation and argue that they are important considerations for the ethics and socio-epistemological functioning of data-sharing in social science.
1. Introduction
This article focuses on the social epistemology and ethics of the internal workings of a research project, and more specifically practices of sharing data among its members. Data-sharing is a practice of increasing importance as academic research becomes more data intensive and as collaborative data-sharing serves a more prominent role in research designs (Tenopir et al. 2011). The epistemic potential, as well as potential epistemic and ethical pitfalls, of data-sharing is widely recognized amongst both scientists (Tenopir et al. 2011) and philosophers of science (Bezuidenhout 2013; Staunton et al. 2021). In this article, I analyze the empirical findings of an institutional ethnography of the UK Research and Innovation funded “GCRF South-South Migration, Inequality and Development Hub” (MIDEQ henceforth), a large international multi-year research collaboration. I employ insights from feminist social epistemology and decolonial theory in order to illustrate some of the epistemic promises but also significant ethical and epistemic pitfalls that emerge in internal data-sharing with a particular focus on epistemic extractivism (Grosfoguel 2019). To do so, I draw on Ramón Grosfoguel’s (2019) theorization of epistemic extractivism and Linda Martín Alcoff’s (2022) theorization of extractivist epistemologies. Each of these theorists argues that practices that aim to extract resources, epistemic or otherwise, generate knowledge practices that need to be reconsidered and that epistemic extraction can come at both ethical and epistemic costs. I argue that data-sharing arrangements and practices are sensitive to Alcoff’s challenges for extractivist epistemologies as well as concomitant ethical challenges.
The focus of this article is on what I call “internal data-sharing” as I distinguish external from internal data-sharing practices. The literature on the ethics of data-sharing primarily focuses on data-sharing practices that takes place in interactions between researchers and other actors situated outside of the immediate research teams they belong to (Alter and Vardigan 2015; Duke and Porter 2013). I call this external data-sharing. In contrast, what I call internal data-sharing takes place in interactions between different members of the same research collaboration.
This article proceeds in the following manner: In section 2 I draw on the existing literature on data-sharing to argue that data-sharing practices are both ethically and epistemically complex, and briefly introduce the extant debates pertaining to extractivism in data-sharing. In section 3 I introduce and discuss epistemic extractivism and extractivist epistemologies in detail with a focus on recent scholarship by Ramón Grosfoguel and Linda Martín Alcoff. In section 4 I discuss the research context and methods for the empirical investigation that underlie this article before turning to the analysis of the empirical findings in light of Alcoff’s theorization of extracivist epistemologies in section 5. Lastly, section 6 consists of the conclusion in which the arguments and main contributions of this article are summarized.
2. Data-Sharing: A Socio-Epistemically and Ethically Complex Practice
Academic research has become more data intensive and more collaborative over the last 20 years, which has sparked increasing scholarly interest in data-sharing (Tenopir et al. 2011). As data-sharing policies, infrastructure, and procedures have been established, academic discussions pertaining to the ethics and the epistemology of data-sharing have intensified (Bishop 2009; Tenopir et al. 2011; Weller 2023). Data-sharing is generally portrayed as an epistemically valuable practice (Staunton et al. 2021; Tenopir et al. 2011; Wallis, Rolando, and Borgman 2013), and has received widespread support from policy makers, research funders, and academic journals. Relatedly, data-sharing is seen as having vast potential for scientific progress (Fecher, Friesike, and Hebing 2015; Huang et al. 2012).
At the same time, research has shown that sharing data is not without its epistemic complications (Staunton et al. 2021; Tenopir et al. 2011). In the extant literature common epistemic concerns include that shared datasets will be used in an epistemically unsound manner and that shared data might be misinterpreted resulting in the production of false or misleading research findings (Bezuidenhout 2013; Feldman and Shaw 2019). The reasons identified for potential misinterpretations include the complexity of specialized datasets, the technical ability of potential data-users to understand and interpret the data appropriately, as well as the data-users understanding of the context in, and the means through which, the data was produced (Feldman and Shaw 2019; Tenopir et al. 2011). Such cautions suggest that data-sharing ought to be understood as a practice that has the potential for epistemic benefits, but also one that can give rise to epistemic problems.
Data-sharing is also an ethically charged practice, and data-sharing practices are often shaped by ethical considerations (Fecher, Friesike, and Hebing 2015). There are plenty of calls for continued development of ethical principles for data-sharing (Duke and Porter 2013; Fecher, Friesike, and Hebing 2015; Weller 2023), and particularly as it pertains to data re-use, citation practices, and co-authorship (Duke and Porter 2013)—themes that will be recurring throughout this article. Relatedly, scholars have noted the potential for exploitation and other forms of harm stemming from data-sharing practices that might affect both research participants and researchers (Alter and Vardigan 2015; Denny et al. 2015). Duke and Porter (2013) argue that data-sharing is a practice that places ethical demands on both those who share, and those who receive and use data generated by others, and thus ought to be understood as a practice that places ethical obligations not only on those who share data, but also on those who receive or use data shared by others.
In this sense, data-sharing raises a host of important questions of both epistemic and/or ethical importance, and the main concern of this article is one of both ethical and epistemic significance, namely epistemic extractivism. That both data-generating practices and data-sharing practices are arenas for epistemic extractivism has been acknowledged and criticized (Smith 1999 [2021]; Kwok et al. 2022; Abebe et al. 2021). Extractivism understood more broadly is a topic that has drawn increasing attention in the literature on data-sharing. For example, both Rodima-Taylor (2024) and Helm et al. (2023) identify extractivist practices and logics in the governance and practices of “big data” technologies and knowledge production. Relatedly, an increasing number of case-studies identify epistemically extractivist data-generating and data-sharing practices across different academic disciplines. Examples of such case-studies include Lehuedé’s (2021) research on data-intensive astronomy in Chile, and Goossens, Varas-Díaz, and Banch’s (2022) study of epistemic extractivism in ethnographic metal music research.
This article contributes to this growing body of literature in two regards. First, it constitutes an empirical case-study of extractivism in data-sharing in the domain of migration research. This case-study illustrates further concerns pertaining to epistemically extractivist data-sharing practices and arrangements, with a focus on data-sharing within and amongst the research teams of a single research project. In doing so, this study elucidates some ethical and epistemic challenges that data-sharing practices can bring about amongst the members of a single research collaboration.
In addition, this article constitutes a theoretically novel contribution to the literature on data-sharing as it is the first to adopt Linda Martín Alcoff’s account of extractivist epistemology for the study and analysis of epistemic extractivism in data-sharing. In doing so, this article introduces Alcoff’s challenges for extractivist epistemologies to the literature, and I argue that both the first and second of Alcoff’s challenges are directly relevant for the organization and practice of ethically and epistemically sound data-sharing. In the upcoming section, I will introduce and discuss the notion of epistemic extractivism in detail with a particular focus on the accounts of epistemic extractivism and extractivist epistemologies as developed by Ramón Grosfoguel and Linda Martín Alcoff.
3. Epistemic Extractivism
The issue of epistemic extractivism, and extractivist epistemology, has been hotly debated among decolonial scholars and has attracted recent interest in both philosophy of science and social epistemology (Alcoff 2022; Harding 2021; Maldonado-Torres 2021; Pohlhaus 2020). However, existing analysis largely overlooks the particularities of data-sharing practices within contemporary research collaborations in which epistemic extractivism takes place, a gap that this article aspires to address. For the purposes of this article, Ramón Grosfoguel’s (2019) account of epistemic extractivism serves as the starting point. On his account, extractivism signifies the removal of unprocessed, or minimally processed resources for export and capital gain. Grosfoguel argues that extractivism, from its origin in the “colonial age” to the neoliberal neocolonialism of today involves the looting, dispossession, theft, and appropriation of the resources of the global South for the benefit of the rich and powerful, particularly in the global North. 1
Grosfoguel expands the discussion of extractivism beyond material considerations, into the epistemic and the ontological. For Grosfoguel epistemic extractivism constitutes a way of engagement that does not seek dialogue between different sets of knowledge, but rather one that seeks understanding exclusively on the extractivists’ own terms. On his account, the epistemic extractivist extracts ideas, primarily from marginalized communities, in order to subsume them within the Western episteme. In doing so, they remove them from the contexts in which they were produced, often depoliticizing them and giving them new meanings. The aims of epistemic extractivism are twofold on Grosfoguel’s account. Epistemically extractivist practices aim to plunder ideas to transform them into economic capital, or by appropriating them into the academic world to earn symbolic capital. Extractivist practices often neither acknowledge the people and communities from which the extracted ideas originate, nor allow them the symbolic and economic capital benefits that result from their extraction. Such practices are also epistemically exclusionary, as those whose epistemic resources are being extracted are being marginalized from taking part meaningfully in the extractive knowledge producing endeavor (Grosfoguel 2019).
Epistemic extractivism should be thought of as both epistemically exploitative and epistemically marginalizing, each of which Tanesini (2022) identifies as a form of epistemic oppression. Linda Tuhiwai Smith (1999 [2021]) argues that academic research within its current colonial conditions brings with it waves of not only exploration and discovery, but also exploitation and appropriation. She explicitly criticizes exploitative extractivist research practices in which researchers enter communities based on the good-will of those communities and extracts both tangible goods, such as blood samples and traditional medicinal remedies, and epistemic goods such as belief systems, ideas, and the practices that go alongside them (Smith 1999 [2021]).
Smith (1999 [2021]) draws attention to how indigenous languages, knowledges, and cultures have systematically been excluded, silenced, and condemned within academic discourses but at the same time are exploited by academics to contribute to those same discourses. She criticizes research practices that “desires, extracts, and claims ownership of the ways of knowing, the imagery, the creations and products of other groups and communities while simultaneously marginalising those same communities” (Smith 1999 [2021], 30).
On Grosfoguel’s view, epistemic extractivism is part of the larger epistemically exclusionary hierarchies of academic knowledge production in which marginalized scholars, particularly in the global South, are systematically excluded. Silvia Rivera Cusicanqui (Cited in Grosfoguel 2019) similarly remarks that in any knowledge system, authority lies with those situated at the top. Those situated at the bottom only provide the input for those at the top to transform into finished products and theories. Such a view corresponds well to the critiques of extractivist and exploitative data generation and data-sharing practices common to contemporary social science research arrangements and practices as put forth for example by Sukarieh and Tannock (2019).
Linda Martín Alcoff’s recent work (2022) expands the scope of analysis beyond epistemically extractivist practices to extractivist epistemology more generally. However, it is important to note that while Alcoff expands the scope of analysis beyond epistemically extractivist practices, she retains the focus on colonialism and imperialism of previous accounts. For Alcoff, reflecting on extractivist epistemology is a way to reveal the effects of colonialism and imperialism on practices of knowing. She argues that extractivism, within the colonial context and in its many different forms, has generated epistemic practices and approaches to epistemology that has gained wider uptake, particularly in the domain of science and research, that ought to be rethought.
Alcoff (2022, 15) locates the development of extractivist epistemologies in what she calls the “epoch of global empires” and argues that they facilitate the extraction of knowledge from subaltern groups to dominant groups. She argues that the extraction of knowledge across unequal relationships of power is facilitated not only by specific practices but also by epistemic ideas that excuse and legitimate non-cooperation and non-transparency. She traces the origin of these ideas and practices to the effects of colonialism and imperialism on epistemic practices and ideals.
On Alcoff’s account, extractivist epistemologies and the practices derived from them, aspire to extract epistemic resources from their original surroundings and thus away from the political, ethical, and institutional context of their articulation. Alcoff’s definition of extractivist epistemologies is centered around four central features: (I) Practices of ranking knowers and the maintenance of hierarchies of knowers, (II) denial of the need for collaboration across different groups of knowers, (III) a view of values as non-relational and objectively determinable, and lastly (IV) as seeking exclusive control over epistemic resources and processes. At the heart of Alcoff’s account of extractivist epistemologies lies its conception of epistemic resources. In extractivist epistemologies epistemic resources are seen as separable from their origin without any subsequent epistemic failures or losses, an assumption that Alcoff forcefully argues is mistaken. Further, she argues that extractivist epistemologies overlook the central role of social relationships and interpretation in our knowledge practices, and render epistemic resources into commodities such as “data” or “information” with exchange values over which exclusive rights can be contractually defined, protected, and enforced.
Alcoff outlines three challenges for extractivist epistemologies that point to epistemic problems that extractivist practices face. The first challenge is that those with the “insider” epistemic resources that an extractor seeks might simply refuse to co-operate, thus barring the extractor from acquiring full understanding of the epistemic resources they seek. Alcoff’s second challenge posits that even if there is co-operation between the “insider” and the “outsider,” there is no guarantee that the “outsider” can truly grasp the meanings of the “insider’s” epistemic resources as there might be significant differences in hermeneutic resources available and drawn upon to the “insider” compared to those available to the “outsider.” The extractor is rendered unable to acquire the full knowledge of the insider as they lack access to the appropriate hermeneutical resources. The third challenge targets the effects of the process of epistemic extraction itself. Alcoff, mirroring Grosfoguel, argues that through the process of extraction the epistemic resource being extracted can lose its efficacy as it is removed from the context and relations that it originated from.
For Alcoff, each of these challenges originates in, and plays out, within the relationships between differently situated knowers. She argues for the need to seek more genuinely egalitarian epistemic collaborations to overcome extractivist influences on our norms and practices of knowing and conducting research. She espouses an understanding of knowledge as incomplete and always involving interpretation, and centers dialogic processes without presumptively discrediting other knowers as necessary for good epistemic outcomes. To this end, Alcoff (2022, 16) outlines four non-ideal corrective epistemic norms that she argues can counteract extractivist epistemologies: (I) acknowledging the incompleteness of all knowledge, (II) developing approaches that recognize plural epistemologies and seek productive, inter-epistemological relationships, (III) practicing relational epistemic humility, and (IV) making regular assessments of our epistemic relationships in projects of knowing. This last norm, Alcoff situates within a more general thrust to attend to the quality of our relationships and suggests that shared epistemic collaborations ought to be partnered with, and shaped by, reflections on our social relationality and examinations of trust, reciprocity, care, patience, and benevolence.
Alcoff’s conception of extractivist epistemologies has gained uptake in debates within both decolonial scholarship and the philosophy of science pertaining to the reorientation of different academic fields and research practices (Ndlovu-Gatsheni 2023; Ludwig et al. 2024). However, data-sharing is a topic given little to no consideration in Alcoff’s initial discussion of extractivist epistemologies, nor in the subsequent scholarship that draws upon her theorization. In this article, I address this gap and introduce Alcoff’s conception of extractivist epistemologies and her challenges into the contemporary debates pertaining to the ethics and the social epistemology of data-sharing in the social sciences.
I argue that the first two of Alcoff’s challenges for extractivist epistemology, at least, can be mapped onto the concerns and challenges that social scientists report in contemporary data-sharing arrangements and practices. I draw on the findings of the institutional ethnography to illustrate how extractivist influences on contemporary data-sharing arrangements and practices within a social science research collaboration can give rise to version of both the first and the second of Alcoff’s challenges. Thus, I illustrate the relevance of Alcoff’s challenges for the debates pertaining to what ethically and epistemically sound data-sharing practices would entail. In the upcoming section I introduce and discuss the institutional ethnography that underlies the arguments of this article and introduce both the methods employed for the empirical data-generation as well as the research context.
4. Case Study and Methodology
4.1. Research Context
The empirical findings that this article draws upon stem from an institutional ethnography undertaken from early 2020 until the summer of 2023 of a large, international social science research collaboration. The primary setting for the institutional ethnography was the UKRI funded “GCRF South-South Migration, Inequality and Development Hub,” or “MIDEQ” as it is called by its members. 2 The MIDEQ Hub is designed to study and unpack complex relationships between migration and inequality in the global South and aspires to decenter the production of knowledge away from the global North, toward the geographical locations where South-South migration takes place to change the understanding of the relationship between migration and inequality (Crawley, Garba, and Nyamnjoh 2022). It draws on the experience and expertise of partners from institutions in 12 countries across five different continents, the MIDEQ Hub’s research aspires to build an evidence-based understanding of the relationships between migration, inequality, and development to disrupt dominant assumptions about the reasons why people migrate, and the consequences of migration (Crawley, Garba, and Nyamnjoh 2022).
The setup of the MIDEQ Hub is complex and structured around twelve research teams aligned along six “migration corridors” that link countries of origin and countries of destination (See Figure 1). Image via mideq (https://www.mideq.org/en/migration-corridors/).
In addition to the research teams based in the countries at each end of these corridors, the MIDEQ Hub includes six research teams arranged in thematic work-packages, three additional work-packages designed to deliver development interventions based on the findings of the MIDEQ Hub’s research, and lastly a work-package focused on adaptive programming and the delivery of MIDEQ’s cross-corridor survey and a work-package focused on the arts that takes a practice-centered approach to the relationship between migration and inequality (MIDEQa, n.d.). 3
4.1.1. Data-Sharing in MIDEQ
The distinction between external and internal data-sharing set out in the introduction is important as the challenges arising from sharing data within the confines of a research project itself can differ from those arising out of data-sharing with external actors. Distinguishing internal data-sharing from external data-sharing allows for the identification of ethical, epistemic, and relational challenges that arise from data-sharing practices, policies, and expectations within on-going research collaborations. That being said, forms of the ethical and epistemic challenges discussed in the next section are not necessarily exclusive to internal data-sharing but can also play out in external data-sharing arrangements. Relatedly, external data-sharing arrangements can also become extractivist as detailed by for example Abebe et al. (2021) and Lehuedé (2021).
The findings of this study emphasize internal data-sharing as a domain of particular interest and importance. That the different actors involved in internal data-sharing arrangements are active collaborators emphasizes the social relationships between them and the relationships of power that shape their interactions and collaborative practices. In this sense, internal data-sharing arrangements, such as those of the MIDEQ Hub, are shaped by closer social relationships and pronounced relationships of power to a different degree than external data-sharing arrangements between researchers and prospective external data-users that might be interested in their data. For example, the relationships between the collaborators within the MIDEQ Hub are likely to be closer than their relationship to a prospective external data-user that would access MIDEQ data through the UK Data Service ReShare process, through which the MIDEQ Hub is required to make its data publicly available as part of its funding requirements. It is also important to note that all internal data-sharing arrangements will not involve the same relational dynamics as those discussed throughout the paper, and it is also possible to envisage similar relational dynamics in some cases of external data-sharing.
Internal data-sharing is ubiquitous to the MIDEQ Hub as it is designed for comparative research both within and across the different migration corridors in to facilitate opportunities for co-learning between the research teams working directly in the migration corridors and the cross-cutting thematic work-packages. Thus, internal data-sharing is imperative to the aspirations and the functioning of the epistemic processes of the MIDEQ Hub. With the design and framing of the Hub centered around migration corridors, it is necessary for the research teams based in the countries at the end of each of these corridors to share data to develop an understanding of the complex dynamics of the migration process between the two countries that make up the corridors.
Likewise, Hub-wide research undertakings such as the MIDEQ survey rely on the internal sharing of survey data from the different countries that make up the migration corridors (MIDEQb, n.d.). Moreover, MIDEQ’s thematic work packages were designed to conduct comparative analysis across the different migration corridors and thus work closely with the country teams within the corridors they studied. This co-operation relies on the data produced by the country teams being shared with the thematic work-packages, and vice versa. Similarly, MIDEQ’s intervention work-packages rely on the data produced by other research teams within the project to make decisions about development interventions and flexible funding allocations. Thus, internal data-sharing is central to the functioning of several important facets of the MIDEQ Hub’s epistemic aspirations, including the corridor-specific research, the cross-cutting thematic research, and the intervention work-packages.
4.2. Methods and Analysis
This study has taken inspiration from recent empirically informed approaches in the social epistemology of research practice and governance (Kidd, Chubb, and Forstenzer 2021; Wagenknecht 2016) that successfully adopt qualitative empirical methodologies in philosophical inquiry. Institutional ethnography as developed by the feminist sociologist Dorothy Smith (1990, 2005, 2006) is a research approach designed to study complex institutionalized social relations that centers how common practices and activities are shaped by social relations and forces that transcend and inform these practices and activities. It is a useful approach for exploring how the “knowing and doing” of differently situated individuals are shaped by the social, political, and bureaucratic systems of the organizations and governance systems within which they operate (Fishberg 2022). Thus, it is a suitable approach for studying the socio-epistemic practices and processes of large research collaborations, and the many different structural and social factors that shape them. While institutional ethnography has primarily been employed to study institutions that are either locally or nationally concentrated, Fishberg (2022) has developed a transnational institutional ethnography that is suitable for studying institutions that are neither physically centralized nor nationally bound. Given the geographically dispersed nature of the MIDEQ Hub, Fishberg’s approach was adopted for this study.
The institutional ethnography undertaken as part of this study was made up by a triangulation of three different qualitative methodologies (Bekhet and Zauszniewski 2012). These were semi-structured interviews (INT), participant observations (OBS), and document analysis (DA). Fishberg argues that triangulating different methodologies is an effective strategy for conducting institutional ethnographies, and notes that combining observations, interviews, and document analysis can be particularly fruitful. Throughout the institutional ethnography the three methods intersected and influenced each other, shaping the trajectory of the institutional ethnography. For example, the processes of finding relevant texts to analyze, participants to interview, and spaces to observe were interlinked and informed by the findings of the other methods.
The observations took place during my time as a doctoral student affiliated with the MIDEQ Hub from February 2020 through August 2023. In total more than 90 events and meetings were observed either online or in person. Beyond observing formal events such as meetings or seminars, the observations also included a large number of informal conversations with different MIDEQ members. The document analysis consisted of analysis of close to 200 texts and documents including academic texts, MIDEQ internal documents such as Hub-specific policies and reports, MIDEQ written outputs such as academic articles and blogposts, and relevant texts from other sources such as policy documents from the Hub’s funder.
The interviews took place later in the research process after the observations and document analysis had been ongoing for more than a year. The interviews built on upon the findings of the observations and the document analysis and offered an opportunity to explore reflections from MIDEQ members on themes identified throughout the observations and document analysis, including the theme of internal data-sharing. Semi-structured interviews were the preferred format as they allowed for asking questions pertaining to specific areas of interest that had been identified, while leaving space for free reflection from the interviewees. This led new topics of interest and information to be discovered throughout the interviews beyond the predetermined themes. In total 15 semi-structured interviews were conducted with MIDEQ researchers.
All the interviews largely followed a similar pattern of execution, starting with four general questions pertaining to the researchers experience of conducting research as a part of MIDEQ, any constraints they faced in their research practice, their thoughts on potential changes toward a more just production of knowledge in the social sciences, and what lesson can be learned from having been part of the MIDEQ Hub. These questions were combined with follow-up questions that were either prepared based on the observations and document analysis or that came up during the interviews. Thus, the semi-structured approach proved valuable in allowing the exploration of trajectories of inquiry that were not considered prior to the interview.
The empirical data was analyzed using an adapted version of Braun and Clarke’s (2006) approach to thematic analysis. The thematic analysis consisted of a total of seven steps, adding a stage of normative analysis onto Braun and Clarke’s six-step model. The seven steps were: (1) familiarization with the data, (2) generation of initial codes, (3) searching for themes, (4) reviewing themes, (5) defining themes, (6) normative analysis, and (7) writing up. The analysis proceeded in the following manner: The first five steps consisted of analysis generating themes pertaining to practices, ruling relations, or structures of either ethical or epistemic importance. Some of these themes were then selected as the focus for the normative analysis. The normative analysis constituted a separate step of analysis from the thematic analysis. During the normative analysis focus shifted from identifying practices, ruling relations, or structures of interests to analysis focused on identifying and analyzing epistemic or ethical problems that arise in or as a result of the identified practices, ruling relations, or structural arrangements. The result of the analysis was then written up. One of the themes identified in the thematic analysis and analyzed as part of the normative analysis, data-sharing and epistemic extractivism, is the central focus of this article.
5. Ethical and Epistemic Concerns in the Data-Sharing Practices of the MIDEQ Hub
This research found an acute awareness of extractivist research practices amongst MIDEQ researchers, and plenty of critical reflection on the ethical and epistemic challenges that come with the Hub’s internal data-sharing processes. This was reflected in both the internal discourse and policies of the MIDEQ Hub. Within the MIDEQ Hub, policies were developed to address the ethical and epistemic issues brought on by the data-sharing demands of the design of the Hub, which were primarily set out in the MIDEQ Data Management Protocol (DMP henceforth). The DMP details the conditions for data ownership, data-sharing, and fair data-usage (DA). In an interview, a member of the MIDEQ directorate explicitly described the DMP as an attempt to address and prevent the extractive use of data that is common in research projects funded and led by actors in the global North (INT, 1 November, 2021). Another MIDEQ co-director consistently described the MIDEQ Hub’s data management and data-sharing policies as codifying a negation to extractivist research practices and epistemic extractivism (OBS). The DMP outlines principles for fair data-sharing and data-use, including principles of fair collaboration, acknowledgement of co-production of data and knowledge, and policies detailing the rules for co-authorship within the Hub.
Despite having these policies in place, MIDEQ researchers still expressed both ethical and epistemic concerns regarding internal data-sharing in the MIDEQ Hub. These concerns largely did not pertain to the content or scope of the Hub’s policies governing its data-sharing processes. Rather, the concerns predominantly pertained to practices and data-sharing arrangements that they found problematic despite the existing policies, as well concerns for potential future wrong-doing based on previous poor experiences. In this sense, the MIDEQ researchers emphasized the existence of a policy-practice gap and the possibility of epistemic and ethical issues in practice despite the existence of agreed upon policies governing the internal sharing of data.
5.1. Alcoff’s Second Challenge, Data-Sharing, and the MIDEQ Hub
In this section I will discuss a set of concerns pertaining to the MIDEQ Hub’s internal data-sharing that was identified during the institutional ethnography. These concerns were primarily articulated by researchers responsible for the generation of empirical data and largely centered on question pertaining to who controls the data that one has produced, usage of that data, and being given appropriate credit for one’s epistemic labor. I analyze these concerns first through the lens of Alcoff’s second challenge for extractivist epistemologies in section 5.1.1, and then with a focus on the ethical ramifications extractivist research practices in section 5.1.2.
5.1.1. Internal Data-Sharing and Alcoff’s Second Challenge for Extractivist Epistemologies
One recurring concern pertaining to the MIDEQ Hub’s data-sharing arrangements raised throughout the institutional ethnography was that those who had produced empirical datasets were at risk of “losing them” to other Hub researchers. MIDEQ members discussed being concerned that when data generation was complete and datasets had been uploaded to the shared internal MIDEQ data repository, other researchers within the project would analyze that data and publish out of it before those who had produced it had the opportunity to do so. These concerns pertained primarily to two distinct, albeit related, issues. The first was the question of who ought to have control over the products of one’s epistemic labor, and the second pertaining to the distribution of credit for the results of that labor. Such concerns are a common topic in the literature on external data-sharing, and it has been demonstrated that among the many reasons researchers are hesitant to share their data, a fear of “getting scooped,” losing out on future publication possibilities, and not getting appropriate credit or renumeration for one’s labor in producing a dataset are commonly cited (Tenopir et al. 2011). The findings of this study affirm that this is also a concern in internal data-sharing arrangements, and one that is shaped by the relationships of power within the research collaboration.
What is at stake in these concerns is at least one, and at worst two, closely related matters of ethical significance, namely the fair distribution of credit and epistemic exploitation. One common form of epistemic exploitation occurs as members of one social group systematically have to carry out epistemic labor to serve the epistemic interests of a different social group (Tanesini 2022). When asked to elaborate on this issue, a MIDEQ co-investigator leading one of the country-teams detailed bad experiences from a previous research project, in which they had generated a qualitative dataset that an international collaborator with little knowledge of the local context in which the data had been produced, decided to publish out of without including them as a co-author. The example given by the MIDEQ co-investigator mirrored paradigmatic cases of epistemic extractivism in which an “outsider” collaborator extracts data without due recognition of the context in which it was produced and without recognizing those who had produced it.
This is a case of epistemic extractivism that includes an instance of internal data-sharing. The researcher who produced the qualitative interview data shared their data with an “outsider,” who extracted only the small part of the dataset which they deem relevant for their current interests, looking past the complexities of the local context in favor of contributing to international debates. In doing so, the MIDEQ co-investigator argued they produced a “very shallow” analysis. This is how they reflected on concomitant epistemic effects:
Quote 1 Sometimes someone elsewhere who hasn’t even read four interviews, you know, out of the 50 interviews you had, just comes up with something very good because their English is so good and then they all know all the international catchwords. Something that would really like make sense to the international audience, and for us it is very shallow because it basically destroyed everything. (MIDEQ Co-investigator, Online Interview, 19 July, 2021)
Cases like this illustrate that internal data-sharing, when done improperly, can become epistemically extractivist and risking the epistemic costs identified by Alcoff. The case discussed by the co-investigator stresses the risk for misunderstandings and misinterpretations when data is used without appropriate understanding of the context in which it was produced. Such concerns were particularly pressing in the context of the MIDEQ Hub, as the research collaboration spanned twelve different countries and a number of different languages including but not limited to English, French, Portuguese, Chinese, and Creole. The MIDEQ co-investigator argued that such research practices ought to be avoided on both ethical and epistemic grounds.
This is an example of how internally extractivist research practices risk failing the second of Alcoff’s challenges for extractivist epistemologies, as the hermeneutic resources available to and drawn upon by the “outsider” differ from those of the “insider.” In this case, this is particularly emphasized as it pertains to understanding the local context and the conditions in which the data was produced. The differences in the available sets of hermeneutical resources risk resulting in the outsider failing to grasp the whole meaning and nuance of the insider’s knowledge and epistemic resources, in this case in the form of the data produced by the local researcher. The extractivist outsider fails to acquire the level of understanding that the “insider” possesses, and as the co-investigator argues, only acquires a shallow understanding of the phenomena being studied.
This is not only an epistemic concern for the understanding that the extractor develops, but also risks reproducing the shallow understanding as the extractor disseminates their findings into the wider epistemic community of their discipline. There are thus good reasons for avoiding extractivist research practices, and for arguing that the “extractor” in such cases does something epistemically bad as they adopt a research practice that risks producing flawed, or limited understandings of the topic being studied. I will return to Alcoff’s second challenge in the next section, as data-users in the MIDEQ Hub similarly identified it as particularly significant.
The MIDEQ co-investigator that gave the example above offered an argument based in epistemic considerations for how international research collaborations ought to be organized to avoid these epistemic pitfalls. They argued that the researchers with the most insight into the process of data-generation, as well as the best understanding of the context in which the data had been produced, ought to play a central role in analyzing and disseminating the research. Even if they are junior scholars or research assistants working with internationally recognized senior academics (INT, 19 July, 2021). Doing so would require a shift in how credibility is attributed and distributed within research collaborations, away from seniority and location in the hierarchies of academia, toward a more nuanced picture centered around differently situated individual’s knowledge of the specific topics and contexts being studied, and their roles and involvement in the epistemic processes of doing so.
5.1.2. Ethical Considerations for Avoiding Extractivist Research Practices
The findings of the institutional ethnography also highlighted that there are important ethical reasons for avoiding extractivist practices. These include the fair distribution of credit and recognition. MIDEQ’s data generation primarily was conducted by the research teams in the country-corridors, and much of it was conducted by early career researchers. One MIDEQ co-investigator based in the global South expressed worry that their early career research assistants would not gain appropriate recognition for their hard work within the Hub, as others might publish findings from the data they produced without their involvement and without crediting them. They described how they and their research assistants feared that they would undertake up to 4 years of hard work producing empirical data and then ending up not being the lead author of any publications despite being the primary data-producer (INT, 19 July, 2021):
Quote 2 I already sense fear from my RAs (research assistants), and it’s also my fear as someone who’s responsible for their well-being. In the end, like you know, from doing four years of research, they would not have anything that they are the main authors of, you know. (MIDEQ Co-investigator, Online Interview, 19 July, 2021)
Such a failure to attribute appropriate credit to early career researchers, for example in terms of authorship and inclusion in publications, would have a negative effect on their career prospects and for their recognition as scholars. Another MIDEQ co-investigator argued that this is part of a larger structural issue in that most researchers in projects like the MIDEQ Hub work on temporary research contracts that come to an end at the same time as the funding of the project ends. As their contracts run out when the project finishes, it is not a guarantee that they will ever have the opportunity to publish out of the data they produce (INT, 23 November, 2021).
The situation of early-career researchers, particularly in the global South, working on contract basis for research projects designed, led, and funded by the global North, is reflected in recent critical research on social science research practice. For example, Sukarieh and Tannock (2019) argue that as the political economy of social science research has shifted, subcontracted research assistants and early-career researchers have become responsible for a growing part of the research processes of many research projects but often face both alienation and exploitation, reflected in a lack of recognition of their work, interests, and concerns. Similarly, Deane and Stevano (2016) have pointed to the epistemic roles played by research assistants which are often unacknowledged, and Turner (2010, 206–7) has argued that fieldwork assistants are “ghost-workers” that are commonly excluded from and rendered silenced in research outputs. Sukarieh and Tannock (2019) suggests that these exclusions, and exploitative relationships raise a range of ethical questions pertaining to the working relationships between senior researchers and their research assistants.
It is important to note that such practices are not limited to interactions between data-producers in the global South and researchers in the global North. Throughout my discussions with MIDEQ researchers they emphasized that not only do such practices include scholars in the global South and scholars in the global North, but also within and between research teams in both the global South and North. For example, several of my interlocutors noted that it is not uncommon that co-investigators use the data that the researchers who belong to their research teams have produced without due acknowledgement in both the global North and the South (OBS), suggesting that such practices find their basis in unequal relationships of power between senior researchers and the researchers they employ as part of their research teams.
The concerns of the MIDEQ researchers highlight the ethical significance of the relationship between data-producers and data-users, raising similar ethical concerns internally, as those identified in the literature on the ethics of external data-sharing (Tenopir et al. 2011). The concern of early-career researchers that their data will be taken or used without their consent is an ethical issue that has been given plenty of attention in the literature on the ethics of data-sharing (Fecher, Friesike, and Hebing 2015). Taking the results of someone else’s epistemic labor without regard for their interests is not only poor collaborative practice but also exploitative and unfair as the extractor would benefit from the labor of others, while denying them the same benefits in terms of recognition and career progression. In a project like the MIDEQ Hub, in which internal data-sharing is integral to the research design, creating non-extractivist data-sharing arrangements and practices is thus important for producing as epistemically sound research as possible as well as for avoiding exploitative and/or unfair research practices. Throughout this section the perspectives of those who produce their own data as part of their research in the MIDEQ Hub have been the focus. In the next section I turn to the perspectives and concerns of those who were to draw upon data produced by others as part of their role in MIDEQ.
5.2. Shifting Perspectives: Reflections from Prospective Data-Users
It was not only those who were responsible for data-generation in the MIDEQ Hub who had concerns regarding the Hub’s internal data-sharing arrangements. Rather, prospective data-users expressed concerns about having to rely on the data generated by others and the complexities of trying to do so without being extractivist or exploitative. The researchers who voiced these concerns were largely expected to rely on the data produced by others as part of their roles in the thematic and intervention work-packages. Although it should be noted that the amount of empirical data-generation varied between the different thematic and intervention work-packages, some of them decide to conduct their own primary data-production.
For example, a MIDEQ co-investigator based in the global North signaled that both ethical and epistemic complexities arise from not generating your own empirical data and having to rely on the data produced by others. They suggested that there is an ambivalence that comes with not having been involved in the data production process. This, they argued, had put them in a position where they after most of the data-generation had been completed did not feel comfortable using the data generated by others. They argued that doing so would feel “wrong” and “exploitative,” calling attention to ethical considerations. They argued that these feelings were a product of the Hub’s design in terms of who was responsible for the majority of the data-generation and the subsequent plans for internal data-sharing and data-use across the different research teams.
They noted that not all research teams were interested in collaboration with scholars from other research teams within the Hub, despite being designated as collaborators in the Hub’s overarching research design. As an example, they described how they were unlikely to co-author anything together with one of the research teams they were assigned to work with, since that research team did not see them as adding any substantial value to the research as they neither had been involved in the data-generation process, nor spoke the local language or were familiar with the research context (INT, 23 November, 2021).
Several of my interlocutors throughout the institutional ethnography argued that not having been involved in the data-production process meant that they lost out on important understanding of the datasets and thus were vary about using the data. These concerns particularly pertained to qualitative data as not being present during the data generation process would mean not having full understanding of contextual factors that shaped the data-production processes and the specific contexts in which the data had been generated (OBS).
The epistemic loss of not being part of the data generation process yourself and relying on data produced by others can be thought of as another version of Alcoff’s second challenge that is particularly relevant for considerations of the epistemic functioning of data-sharing arrangements. As the concerns of the MIDEQ researchers signal, not being part of the data generation process can leave the data-user not able to gain a full understanding of the empirical material as they do not have the same hermeneutic resources available to them as those who produced the data. It seems plausible to think that there is a wider set of hermeneutical resources to draw upon to understand a dataset for those who took part in producing it compared to those who did not. 4 For example, researchers who take part in producing a dataset are likely to better understand relevant contextual factors that influenced the data-production process and of “in the moment” decisions made during the data production process. In qualitative research one can also assume that data-producers themselves have a better understanding of relational factors between the researcher and research participants that might have affected the research findings.
This is a topic where the existing literature on data-sharing and that on epistemic extractivism overlap. In the literature on data-sharing several writers express concern with data-users’ ability to properly interpret and understand complex sets of data, which might result in negative epistemic consequences such as misunderstanding, false research findings, and ultimately ignorance (Bezuidenhout 2013; Huang et al. 2012; Tenopir et al. 2011). Alcoff’s second challenge becomes particularly relevant for research collaborations like the MIDEQ Hub due to the diversity and geographical dispersion among its researchers.
However, it is important to note that these issues are not necessarily inherent in all data-sharing arrangements, but rather are sensitive to contextual factors such as what type of data is being shared. For example, it seems plausible to think that differences in available hermeneutical resources pose a more significant issue when dealing with complex qualitative datasets, such as those discussed by the MIDEQ researcher in quote 1, that require significant understanding of the context in, and the process through, which the data was generated to properly make sense of the data. The epistemic challenges of sharing qualitative data have been discussed at length (Bishop 2009; McCurdy and Ross 2018; Van den Berg 2005). For example, sociologists such as Feldman and Shaw (2019) note that one should not underestimate the importance of the relationships between qualitative researchers and their subjects, and the specificity of context and time as not only crucial aspects of data-generation but also of analysis and interpretation. In contrast making sense of, at least on a basic level, the data in a dataset containing monthly mean temperatures for a specific region such as those made publicly available by the UK Met Office does not seem to stress the hermeneutical importance of having been part of the data generation process to the same degree. 5 Rather, the concerns discussed throughout section 5 of this article ought to be understood as arising in the particular context of the MIDEQ Hub and its research design which puts emphasis on sharing both qualitative and quantitative data among and across a diverse set of research teams and geographical contexts.
5.3. Resistance and Alcoff’s First Challenge
Given the concerns expressed by both data-producers and prospective data-users discussed above, it should come as no surprise that some MIDEQ members resisted sharing their data. In doing so these researchers illustrated the relevance of Alcoff’s first challenge for extractivist epistemologies for data-sharing arrangements. Alcoff’s first challenge pertains to instances in which those with the epistemic resources that an extractor desires, refuse to co-operate and thus prohibit the extractor from acquiring the epistemic resources they seek. Across the institutional ethnography, several such instances were identified.
One MIDEQ Co-Investigator based in the global South described the MIDEQ Hub’s data-sharing requirements as extractive and as an exercise of colonial power. They argued that the overarching design of the MIDEQ Hub, in which some research teams are assigned data-generating roles and other teams assigned roles in which they were to use the data for comparative analysis, encouraged extractivist research practices (INT, 4 August, 2021). In this sense, one might be concerned that research designs such as that of the MIDEQ Hub carry with them extractivist assumptions. For example, one might be concerned that there is an assumption that those who were to use the epistemic resources, in this case the datasets produced by others, would be able to adequately perform the identification, interpretation, hypothesis generation, judgment, and application of the information on their own without epistemic costs. As discussed above, these concerns are particularly pertinent regarding the qualitative data generated by the MIDEQ Hub’s researchers. As Alcoff warns, epistemically extractivist processes can become epistemically flawed as they assume that the extractor is able to perform these actions without engaging with the contexts and communities from which the information came without any epistemic repercussions.
The co-investigator described being skeptical of the data-sharing requirements of the MIDEQ Hub and that they, together with the other researchers working as part of their team, decided to only share summaries of their data rather than full datasets. The co-investigator argued strongly for the view that data should belong to the researcher that produced it, and that the data-owner themselves should be able to decide which data is shared and under what circumstances (INT, 4 August, 2021). While hesitant to share their data at first, a member of the same research team mentioned in a conversation that took place about 12 months later that the team had decided to share more of their data and had for example uploaded full interview transcripts to the internal MIDEQ data repository (OBS).
This type of resistance was mentioned in interviews with several other MIDEQ researchers and became evident across the institutional ethnography as well. Some research teams found ways to subvert and resist taking part in data-sharing practices they objected to. As noted above, some research teams curtailed the data-sharing expectations by only sharing summaries of their data. Others decided to only share data in the original language that it was produced, thus raising barriers for use of the data by researchers who did not speak the local language (OBS). These examples illustrate that researchers can exercise their control over the data they produced to resist potentially epistemically extractivist research practices.
When asked about the reasons for resisting sharing their data, MIDEQ researchers offered a range of reasons for doing so. These reasons included ethical considerations such as preventing potential harms to research participants or the spread of sensitive information. Others cited their wish to control the use of the data they had produced, at times in combination with low levels of trust in potential data-users to use the data in a responsible and fair manner. Others highlighted epistemic considerations such as concerns about the ability of prospective data-users to be able to appropriately analyze and make sense of the data (OBS).
The agency to push-back against data-sharing arrangements that one disagrees with, whether on ethical or epistemic grounds, has implications for thinking about extractivism in data-sharing. In data-sharing arrangements in which the data-producers have substantial control over their data, such as in the MIDEQ Hub, it is thus plausible to think that data-producers can resist and subvert epistemically extractivist practices, by controlling access to their datasets.
6. Conclusion
In this article, I draw on the findings of an institutional ethnography of a large social science research project to identify a number of both ethical and epistemic concerns that social scientists report with internal data-sharing practices. I argue that these concerns can be understood as versions of Alcoff’s first and second challenges for extractivist epistemologies. In doing so, I illustrate how extractivist epistemologies and practices derived from them can shape and inform data-sharing with concomitant epistemic and ethical consequences. This article contributes to the literature on extractivism and data-sharing in two distinct ways. It constitutes the first adoption of Alcoff’s conception of extractivist epistemologies in the literature on data-sharing. In doing so I illustrate the relevance of Alcoff’s challenges for extractivist epistemologies for the debates pertaining to the ethics and the socio-epistemic functioning of data-sharing. Further, this article contributes a case-study identifying epistemic extractivism in contemporary data-sharing arrangements and practices in the domain of migration research to the growing literature on extractivist data-sharing arrangements and practices within academic research. This article also clears ground for further inquiry and there are still questions of interest that remain unexplored within the remit of this article and in the wider literature. For example, this article briefly touches upon Alcoff’s corrective epistemic norms for extractivist epistemologies, but their suitability as corrective measure to address extractivist data-sharing practices remains unexplored.
Footnotes
Acknowledgements
I thank Heaven Crawley, Katharine Jones, Zainab Mai-Bornu, Linh Mac, and Francesca Sharman for helpful discussions and comments on earlier drafts of this paper. I am also grateful to the audiences at the British Society for the Philosophy of Science annual conference at the University of Bristol and at the European Network for the Philosophy of the Social Sciences conference at Radboud University for their helpful comments, questions, and suggestions. Lastly, I thank the guest editors and reviewers of this special issue for their reviews and feedback that have greatly improved this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
