Sage Journals: Discover world-class research

Abstract

Research data management has become an inevitable part of research, and both funding agencies and publishers nowadays require open and reusable data. This article focuses on one of the most prominent initiatives promoting reusability of research data – the FAIR guiding principles – nowadays widely accepted as the new standard for research data management. Through semi-structured interviews, we investigated researchers’ experiences of practicing FAIR research data management within the context of a multi-stakeholder project within the field of health research funded by the European Commission. Our analysis showed that the informants’ experiences of practicing FAIR research data management differed largely depending on which scientific tradition they belonged to; something that previous studies have attributed to shortcomings in the current infrastructure, lack of resources and persistent cultures around data sharing in the wider scientific community. Drawing on previous work presented within the field of Critical Data Studies (CDS), we argue that our findings point to a more fundamental problem; namely the failure to recognize that the FAIR framework is built on a positivist conceptualization of data. We argue that if FAIR is to have any chance of succeeding in its ambitions to be as inclusive and all-encompassing as it wants to be, these insights need to be taken more into account and provide some potential pathways.

Keywords

FAIR research data management data governance health data Big Data machine-readable

Introduction

Good research data management is not a goal in itself, but rather the key conduit leading to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse. (European Commission, 2020)

Today's research landscape has become more data intensive due to technological advancements, with researchers producing and sharing increasing volumes of data (European Commission, 2018). Extracting valuable knowledge from this growing amount of data presents a challenge for academia as well as industry (Lefebvre and Spruit, 2019). Consequently, research data management has become an inevitable part of research and funding agencies and publishers nowadays require open and reusable data (Birkbeck et al., 2022; Inau et al., 2021). While these developments have made discussion on and investigations of research data management more intense during the past couple of years, the importance of “good” research data management and data stewardship was recognized well before this (Jacobsen et al., 2020; Thompson et al., 2020). Albeit, since new technological developments affecting the research data management landscape are continuously presented, discussion around what constitutes “good” research data management, and its practical implications are far from static.

Recent developments within research data management should be understood in connection to the open science movement; where transparency of and open access to research data could be regarded as a response to what some consider to be a “reproducibility crisis” in science (Hicks, 2023; Manola et al., 2020; Plomp et al., 2019). The COVID-19 pandemic further illuminated this perceived need for sharing data and making it reusable; where problems with “siloed” data, lack of standards for reporting data and legal barriers for data sharing and reuse made it difficult for researchers within the health field to respond quickly to the pandemic (Hughes et al., 2023; Rocca-Serra et al., 2023).

In this article, we focus on one of the most prominent initiatives promoting reusability of research data – the FAIR guiding principles (Wilkinson et al., 2016) – nowadays widely accepted as the new standard for research data management (Bloemers and Montesanti, 2020; Inau et al., 2021). The FAIR acronym stands for the main principles of Findable, Accessible, Interoperable and Reusable (Wilkinson et al., 2016). The main principles are further accompanied by 15 guiding principles (Wilkinson et al., 2016). As the idea is for the principles to be domain-independent, its founders emphasize that the principles should be interpreted as guideposts rather than rules; it is up to each scientific domain to define their own standards for FAIR taking their domain-specific requirements into account (Wilkinson et al., 2016). Key aspects inherent in FAIR's emphasis on reusability is the call for improving the infrastructure around research data management, as well as the need for making metadata and metadata standards clearly articulated and publicly available (Wilkinson et al., 2016). Several advantages of adhering to the FAIR principles are suggested, where reducing the risk of duplication, helping researchers adhere to the requirements from funding agencies, scaling up research findings by integrating with existing data and spending time on interpreting existing data rather than collecting new data that reproduces existing data constitute some of the most crucial (European Commission, 2018). What separates FAIR most distinctly from similar initiatives promoting reusable and open research data management is its emphasis on machine-actionability; according to Jacobsen et al. (2020: 12) FAIR should even be regarded as a “…consolidation of earlier efforts that have been made during the past decade in making research machine-actionable.” Placing this within current developments in Machine learning and Artificial intelligence, the emphasis on machine-actionability makes perfect sense: producing data in accordance with FAIR enables these technologies to find and use data (Mons et al., 2020).

The European Union (EU) has adopted FAIR as the proper approach to research data management. In their previous funding program (Horizon2020), projects were not mandated to adopt FAIR but it was nevertheless strongly encouraged. In its current funding program, Horizon Europe, adhering to FAIR has become mandatory; hence making it the new standard for all EU funded projects.¹ While this decision should be seen in light of the European Commission's Open Science agenda and their work around the European Open Science Cloud (EOSC) and the European Health Data Space (EHDS) (see e.g., Hoeyer et al., 2024), the European Commission (2018) assert that FAIR and Open science should not be conflated; FAIR does not necessarily imply open data, and data can be shared under restrictions and still be FAIR. In line with the original intention presented by Wilkinson et al. (2016), the European Commission (2018) further highlight that there is no one-size-fits-all way of managing research data in accordance with the FAIR principles; what is appropriate and feasible for each project depends on the research domain and the data type(s) involved (see also Mons et al., 2020).

This article investigates researchers’ experiences of practicing FAIR research data management within the context of a multi-stakeholder project funded by the European Commission under the Horizon2020 program. It is, to the best of our knowledge, the first study that investigates practices of FAIR research data management within a context where partners from a variety of scientific domains collaborate, even though they all work within the broader field of health research. While previous research has identified a number of challenges and deficiencies when it comes to adoption of FAIR principles, both for research in general and for the field of health research more specifically, the majority of these studies have departed from the assumption that FAIR is the most suitable way forward for the research community (with the exception of Tóth-Czifra., 2020). From this standpoint, current barriers for full-scale adherence to the FAIR framework have been identified, highlighting everything from persistent attitudes among researchers (or what is often described with the term “cultural barriers,” see, e.g., Wise et al., 2019) to lack of resources (both in terms of time and actual finances). The aim of these studies thus appears to be twofold: identifying barriers/problems and presenting (more or less) concrete proposals on how to overcome these. Our aim was to take a more reflective approach, focusing on the experiences of researchers and especially their practices of (FAIR) research data management.

Our approach to studying FAIR is inspired by Bowker and Star's (2000) seminal work on information infrastructures. It is an attempt to shed light on the perspectives and assumptions on which this particular information infrastructure is built. In line with Bowker and Star (2000), we believe that it is fruitful to think classification and standardization as utterly social practices. Through these practices a certain worldview is conveyed, imposed on others and a direction for the work related to the information infrastructure at hand is set based on what some actors (those who possess sufficient power) view as desirable (Bowker and Star, 2000). Just like Bowker and Star (2000) we believe that it is precisely in situations where infrastructures “break down,” or when something does not work as intended, that we really get the opportunity to recognize its construction. In our discussion of the interview material, we draw on previous work presented within the field of Critical Data Studies (CDS) (boyd and Crawford, 2012; Hoeyer, 2023; Stevens, 2021; Stevens et al., 2018). Through applying critical social theory to investigate data as a multi-facetted phenomenon, research within this field has contributed with shedding light on the epistemological foundations on which the current discourse around Big Data rests (boyd and Crawford, 2012; Hoeyer, 2023; Stevens, 2021; Stevens et al., 2018).

In addition to a departure from the assumption that FAIR is the right way to go, previous work has mainly focused on identifying challenges within certain scientific domains; namely those who work with large data sets adopting quantitative methods for analysis (with the exception of Tóth-Czifra, 2020). As the European Commission (2018) has noted, solving today's grand societal challenges demands that researchers from a variety of scientific domains work together. This results in funding of more interdisciplinary projects that include researchers who work with other types of data than the kind often found within biomedical and medical engineering disciplines: large data sets with determined variables to be analyzed adopting quantitative methods. While previous research findings have highlighted how different scientific domains face different challenges when it comes to handling their data in accordance with the FAIR principles (Adams et al., 2023; David et al., 2020), exactly what these challenges look like and how researchers from different scientific domains tackle these obstacles is, to a large extent, not known. Further, while it has been noted that there is a general lack of community specific guidance for researchers (Adams et al., 2023), exactly how this affects research data management practices for researchers from different scientific domains – what solutions they come up with during this process and what resources they draw upon to do so – are questions that also deserves more scientific attention. It is important to note that the point of community-specific guidance is more fine-grained than the noting of scientific domain, as the epistemological stance is not necessarily tied to the methods in a domain. There are for instance more positivist qualitative researchers, and more constructivist quantitative researchers and computer scientists, and therefore the notion of community is more precise, even though the word domain is more common in the literature we build upon.

Methodology

In order to address this identified knowledge need, we conducted a series of interviews with partners engaged in a multi-stakeholder project funded by the EU under the Horizon2020 program whose work resides within the field of health research. The project was selected as the research group of the authors was a partner to execute some technical tasks, and the project management expressed frustration about the gap between the FAIR promise in the project proposal and the actual practices during the project execution. Our research was not a part of the project plan and not included in the Description of Work in the project grant agreement. It should therefore be considered an opportunistic sample of a project, based upon a shared experience between senior researchers, all experienced in large EU collaborative projects.

The project included partners from different scientific domains as well as private companies, who conducted their work within a variety of European countries. Since the focus of the study was to investigate practices of (FAIR) research data management, the interview material included in the final study consists of interviews with researchers only (n = 9). The participants were selected based on the individuals’ willingness to participate; an email was sent out to all of the partners engaged in the project, in which the purpose and intended outcome of the project, as well as the conditions for the interview were explained. As we wanted to create an opportunity for the informants to share their experiences of practicing FAIR research data management more freely, allowing their experiences to guide the direction of the interviews, we chose to conduct semi-structured interviews. The interviews were conducted from May to October 2023. Each interview took approximately one hour and was conducted over Zoom.

The analysis of the interview material was conducted following Braun and Clarke's (2006) recommended process for thematic analysis. We chose to adopt an inductive approach, in which the process of coding the material is not guided by the researchers’ theoretical or analytical preconceptions (Braun and Clarke, 2006). While Braun and Clarke (2006) argue that researchers have to choose between developing themes on either an explicit or interpretive level, we regard these different levels as two steps of the same process. Unlike Braun and Clarke (2006), we believe that the development of themes always involves interpretative work and not only in cases where one chooses to practice what they call latent thematic analysis.

We started by organizing, summarizing and interpreting our interview material based on what was explicitly stated by the interviewees. What stood out most clearly in the material at this stage was that the interviewees understood and defined the concept of data management very differently. This insight was then divided into two subsequent themes: one which concerned the interviewees conceptual/theoretical understanding of research data management (understanding, thoughts, attitudes) and one which revolved around the more practical aspects (what actions were taken/not taken) of research data management. When looking at these two themes it became apparent that there was a mismatch between the way FAIR was appraised as desirable, and the actual practices that the interviewees engaged in. To further investigate, we moved on to the next step of the analysis where we aimed to identify the ideas, assumptions and ideologies that could be said to underlie these “explicit” statements. Our starting point for interpretation was that at the core of these explicit statements was the concept of data in itself, its epistemological meaning. The different epistemological understandings of data led to different understandings of research data management as well as affecting the practical approaches taken. As a way of making sense of our findings we choose to discuss them in relation to previous work presented within CDS. Finally, we compared our findings with previous research findings to identify which observations confirm previous findings, what is new, what points in another direction and so on.

Background

A number of previous studies investigated the potential challenges and obstacles researchers face when attempting to align their research data management practices with FAIR. These investigations were conducted either in domain specific contexts (see e.g., Nicholson et al., 2023; Rehnert and Takors, 2023; Tanhua et al., 2019; Tóth-Czifra, 2020) or on a more general level (for the scientific community as a whole) (see e.g., Adams et al., 2023; David et al., 2020; Jacobsen et al., 2020). While domain-specific obstacles have been observed within the first set of studies, certain commonalities can be traced independent of the domain under investigation. Out of these, those that are highlighted as most frequently occurring are (1) working toward FAIR data is resource (human and financial) consuming, (2) there is a lack of skills and knowledge (particularly among researchers), (3) there is a lack of incentives for individual/groups of researchers, and (4) there is a need for a “cultural shift” to take place. Incidentally, these identified challenges resemble those that have been identified in studies investigating challenges for data sharing in research in general (Perrier et al., 2020; Plomp et al., 2019).

Since FAIR is a relatively new framework for research data management practice (first launched in 2016), the process of evaluating and developing it is very much a work in progress. Some recent investigations into FAIR research data management revolve around problems with cross-community consensus and interoperability (Adams et al., 2023; David et al., 2020; Jacobsen et al., 2020). According to David et al. (2020), the emphasis on machine-actionability makes it difficult for non-data scientist to use and understand the principles; accordingly, successful broad implementation of tools and services for making data FAIR relies on the understanding of the principles being accessible for everyone and not just those skilled in data science. Further, they argue that the numerous benefits of using shared standards vocabularies and ontologies for data must be better and clearer explained to all actors since they can appear not only complex but also time consuming (David et al., 2020). Much in line with what most other investigations of FAIR have pointed out as a need for a “cultural change,” they conclude that educating and convincing researchers in the benefits of FAIR is another essential part of ensuring its broad and qualitative implementation (David et al., 2020; see also Rocca-Serra et al., 2023). Related to this, Adams et al. (2023) highlight the fact that researchers from different scientific domains face different challenges and start out from different points of departure; while some scientific domains already have strong traditions and structures for research data management in place others do not. Another aspect that has been subject to critical scrutiny is the freedom of choice for interpreting the principles, in which the risk of incompatibility between different scientific domains has been highlighted as particularly urgent (European Commission, 2018; Jacobsen et al., 2020; Thompson et al., 2020). According to Jacobsen et al. (2020: 13), “…the lack of common understanding around the original intentions of the guiding principles is crucial to avoid divergence into non-interoperability once again.” While they emphasize the importance for scientific domains coming together to agree on standards through community-governed platforms, they also stress that for interoperability to happen community convergence is essential (Jacobsen et al., 2020). They believe that this is best achieved through allowing “FAIR-aware” data stewards to guide these convergences (Jacobsen et al., 2020; see also Mons, 2020).

The majority of articles that investigates challenges and obstacles for FAIR research data management are conducted within the health research field (Boeckhout et al., 2018; Hughes et al., 2023; Inau et al., 2021, 2023; Löbe et al., 2020; Wise et al., 2019). Open access and reusability of data is often highlighted as necessary measure to be taken for improvements to come about within the health field, where professional data management coupled with big data analytics is held forward as the solution for transforming health care delivery, improving prevention, diagnosis, treatment and well-being (Alvarez-Romero et al., 2022; Boeckhout et al., 2018; Inau et al., 2021). Lack of resources, incentives, skills, knowledge as well as cultural barriers are held forward as aspects which obstruct full adherence to FAIR within the health research field. Here, discussion around cultural barriers is often connected to questions around incentives; something which could be described, somewhat sharpened, as a shift deemed necessary in attitude from “my data” to “our data” (e.g., Wise et al., 2019). Wise et al. (2019) further argue the since making data FAIR is resource consuming and demands effort, the incentives for researchers to dedicate time and effort into doing this must be more clearly articulated. Hughes et al. (2023) suggest that a recognition of this from funding bodies, and an increase in budget accordingly, could help solve this problem. Challenges that seem to be specifically palpable for researchers working within the health research field are the lack of standards for reporting data and describing metadata, the use of different information systems and privacy concerns (often discussed in connection to GDPR) (Boeckhout et al., 2018; Hughes et al., 2023; Inau et al., 2023; Löbe et al., 2020; Queralt-Rosinach et al., 2022; Wise et al., 2019). Pseudonymization and anonymization are steps which according to Inau et al. (2023) are unique to the health research domain due to the sensitive nature of the data. Privacy concerns and the sensitive nature of the data – connected to legal aspects – not only lead to problems in terms of making data findable, accessible and reusable, but it is also time consuming and hence costly (Inau et al., 2023; Löbe et al., 2020). It also makes it difficult for researchers to share their data across national-boarders, as different countries can add additional layers of protection on top of the GDPR foundation shared by all members of the EU (Tacconelli et al., 2022). Inau et al. (2023: 2; referring to Beyan et al., 2020) summarize the problematic situation by stating that “…enormous amounts of usable health data (are) currently imprisoned inside the organizational territories of hospitals, clinics, and within patients’ devices due to ethical concerns and data protection rules.”.

Epistemological views in the context of FAIR are important to address. Landström (2024) positions data-sharing as a socio-epistemic practice that can lead to ethical and epistemic challenges and situates this on Alcoff's (2022) theory of extractivist epistemologies, with a discussion on internal sharing. Tenopir et al. (2011) found differences between scientific disciplines and geographic regions, pointing at the importance of epistemological and cultural differences for data sharing to be implemented and used. Bezuidenhout (2013) discussed ethical issues around dual-use from shared data, and problematizes the chain of responsibility. Feldman and Shaw (2019) particularly unpack the epistemological challenges to open access to qualitative data, and “ask whether the claim for greater efficiencies and accountability of public access are appropriate for the co-constitutive character of qualitative evidence and what these demands portend for knowledge production.”

Next, we present the main insights that our analysis of the interview material provided and begin to discuss our insights in relation to previous research. In the discussion section, we continue to relate these different research insights to each other, taking a slightly broader approach to present our reflections on what we consider to be some of the more fundamental aspects that can create problems in relation to the goal of getting all different kinds of scientific domains on board the FAIR train.

Data management, data governance and the many potential approaches to data

What initially struck us was that the informants seemed to understand the concept of research data management quite differently. When inquired about their attitudes toward research data management, some of the informants seemed genuinely puzzled by the question. This group of informants seemed to regard research data management as a self-evident part of their work; hence, a question about their “attitude” toward it was perceived as a bit strange. This group of informants had all encountered the FAIR principles before, and most had experience of practicing research data management in accordance with them. Another group of informants stated that they saw research data management as a necessary and worthwhile task, nevertheless “quite dull and time consuming” (informant 5). They further stated that they were not used to thinking of their handling of data as part of their research work in terms of “data management”; a concept which they perceived as being closely related to data science, a field of research with which they felt no affinity (even “a bit intimidating,” as informant 6 put it). Noone in this latter group of informants had encountered the FAIR principles before. While they certainly made use of different types of data within their daily research work, they were not accustomed to dealing with questions of data storage, data accessibility and data sharing to the same extent.

We relate these differences to the fact that these informants belonged to different fields of research. While the former group of informants were active within research fields that predominantly work with large datasets and use quantitative methods of analysis, the latter group of informants were used to working either with a mix of quantitative and qualitative data, or predominantly with qualitative data and primarily use qualitative methods of analysis. In line with Adams et al. (2023), our material thus shows how the pre-conditions for handling the task of FAIR research data management vary depending on what tradition of research you belong to, what type of data you usually work with and the methods you use to analyze the data. If analyzing large data sets adopting quantitative methods is part of your daily work, and if you primarily make secondary use of data (often collected at hospitals in the case of our informants), then a proper infrastructure for research data management is essential. In the case of some of our informants it was already in place. As one informant (3) put it:

…Your research is only as good as your data and how you have collected or cleaned or checked for missing data and analyzed it. So, you have to get every step right to make it accurate and meaningful.

The quote above is telling of how one group of informants viewed their own research process and how to handle data within it: if structuring and organizing data constitutes an essential part of your research methodology, your research cannot function without a proper infrastructure for research data management in place.

When we asked the same informant (3) about her experience of writing the data management plan for the project, she expressed herself in the following way:

Yeah. So, basically, it is more of a data analysis plan rather than… So, data management, I'm not entirely sure what exactly you mean by data management. So, we write a plan of how we are going to link data, how we're going to receive data, where it will be stored, who will have access, how it will be cleaned and analyzed.

What our informant chose to describe as the data analysis plan, i.e., the practices that produce scientific insights/research results, could also be understood as (stepwise) practices of research data management. In other words, the plan for handling research data (whether one prefers to call it data analysis- or data management plan) is described as a utilitarian step in achieving scientific results – one which is not done with FAIR or Open science goals in mind where data would have a life beyond the project and the analysis scope. This insight resonates with another one of our findings; namely that our informants seemed to prefer talking about research data management in general rather than FAIR (research data management) in particular, even when the questions we posed were explicitly about FAIR. The comments our informants gave about FAIR per se were often expressed in terms of a desirable future (“good idea,” “makes sense,” “should be implemented more widely”) rather than describing actual practical steps they had taken to make their data FAIR. In addition, the majority of informants emphasized that a lot of problems around FAIR arose when working with person-sensitive data, and that GDPR and similar regulations stood in the way. This is in line with what previous research has indicated (Inau et al., 2023; Löbe et al., 2020). These findings could be interpreted as an indication of a reality in which our informants did not make much of an effort to live up to the ideals of FAIR. However, we believe that this should instead be seen as an expression of the fact that FAIR is a goal, an aspiration for reaching a certain outcome of research data management, rather than the practice of research data management itself. It is not FAIR that is practiced – it is the research data management practices that (ought to) strive toward FAIR.

While the researcher working with mixed or qualitative data described the experience of attempting to adhere to the FAIR principles in research data management for the project with terms like “educating and rewarding” (Informant 1), they also emphasized the hardship that they faced. One informant (5) described it as a “learning curve,” where she had to put in a lot of effort just to understand the terminology around FAIR research data management. Terms like “format,” size of the data and metadata were unfamiliar to her. Part of this effort was also searching for proper resources to draw from in the process of interpreting the terminology. When she and her colleagues attempted to consult the library, it turned out that they were not able to understand the terminology either. This experience was shared among other informants; while there was always some kind of infrastructure available at their universities where researchers could turn for advice on research data management, the informants who sought support discovered that these support services lacked sufficient knowledge around how to handle the specific kind of data which they were working with. Further, on top of the extra time and effort that these informants had to put into figuring out how to practice FAIR research data management in an accurate way they also described how they experienced it as being difficult on a mental and emotional level. As one informant (5) stated, she “…could never feel confident” that she had followed the guidelines in the way that was expected of her which led to her feeling “constantly stressed and worried.” In line with what David et al. (2020) has noted our material thus show how researchers whose methodologies are far removed from data science find it more difficult to understand the terminology involved in the principles. It also shows that the group of researchers that really needs support constitutes the very same group for which there is a lack of it.

Albeit, the interview material shows that those informants who were not used to working from the concept of research data management, and even felt a bit intimidated by it, still consider their experiences of attempting to adhere to FAIR research data management reasonable and rewarding (after being “forced” to learn how it works during their involvement in the project). Despite the hardship faced, they emphasized that they had come to realize (after having learned about and worked in accordance with the FAIR principles for research data management in the project) that the way that they have handled data “intuitively” previously was actually very much in line with (how they interpreted) the FAIR principles. How can we understand this? A closer look at the material reveals that they saw FAIR research data management as a useful framework to structure and organize their data for their own and other project members’ benefit, rather than as something that contributes to accessibility and reusability for the sake of the wider research community. As one informant (6) put it:

I've never worked with the principles, but that's something that I have in mind when I store data. Again, I have to know that if I'm working with a team, everyone needs to have access to it. I'm always aware that everything, every single thing that is there, if I'm taking notes, each note has to make sense on its own. So that if someone later on joins the research, it knows that what I'm talking about and don't really need me to understand this kind of thing. So yeah, making accessible through creating a logical architecture in terms of how to store the folders and the files. So I've never worked with the principles, but these are kind of concerns that I have mainly for myself in how I organize my own data, but also mainly when working with a team. It's one of the questions like, how do you go about structuring your files? How do you go about naming this? What's the conventions for naming and for nesting things?

In this excerpt, FAIR research data management is described as a process that is helpful for keeping track of one's own data, but as a process that stands outside of the actual knowledge production the informant engages in through her research. Instead, she interpreted it as a kind of checklist, in which she tackled the different steps (Findable, Accessible, Interoperable, Reusable) one by one over time. This description of how research data management was practiced resonates with the remaining part of the group of informants who worked with mixed or exclusively qualitative data. For this group of informants, there was a lack of an overall plan – a blueprint – for how to manage their data throughout its entire lifecycle. Although attempts were made at seeking support from the data management office at their university in establishing this kind of over-all plan, this type of support was not available. As described, the second group of informants already had this type of infrastructure in place. This latter group of informants took a more holistic approach in managing their data, basing their research data management practices on a plan that covered the entire data lifecycle. These differences identified between informants can be expressed in terms of data governance versus data management: where the former represents the act of issuing from a comprehensive plan for managing data, and the latter merely consists of the technical implementation of the plan – the practical work of managing data. As Everett (2024) argues, while data management can be done without a blueprint (data governance), the process will not be as smooth and you are likely to encounter problems along the way. This last point from Everett (2024) resonates with our material, where the informants lacking data governance had to put in massive amounts of work in figuring out how to practice research data management one step at a time. Another consequence that this lack of support led to was that the informants ended up having to make their own, independent interpretations of the principles. If the goal, as some suggest, is to get researchers to work toward data consensus and standards (Jacobsen et al., 2020; see also Mons, 2020), both within their own disciplines and across disciplines, then these findings point to a potential problem.

The fact that the conditions for practicing research data management in accordance with FAIR are different for researchers working in different scientific domains has been highlighted in previous studies (Adams et al., 2023; David et al., 2020; European Commission, 2018; Jacobsen et al., 2020). So far, we have tried to highlight what those discipline-specific challenges are, and what researchers do to try to overcome the obstacles they face. We believe that our study contributes to a deeper understanding of why this is the case and what consequences this can have for individual researchers. We do, however, believe that it would benefit the discussions around FAIR research data management and its challenges to take a few more steps back; that instead of continuing to identify problems with the current infrastructure, we would benefit from having a discussion about the relationship between data – its different forms of interpretation – and knowledge. We suggest that introducing previous works within Critical Data Studies into the discussion of our material is particularly well suited for this purpose (boyd and Crawford, 2012; Hoeyer, 2023; Stevens, 2021; Stevens et al., 2018). Based on our material, it is clear to us that the informants who were struggling to understand how to practice FAIR research data management in a correct manner were not used to looking at and thinking about their data from the same epistemological perspective on data that the FAIR framework is built upon. FAIR was born in the life sciences (Rocca-Serra et al., 2023), and with this come certain assumptions about how to understand data and how data relates to knowledge.

Discussion

We end this article by discussing some of the broader implications of the patterns this study found. Relating to previous research, it may well be that the need for better infrastructure is the big knot that needs to be untied for significant improvements to come about in terms of adherence to FAIR research data management within the broad research community. Parts of our findings, like the anxiety and worry that some of our informants experienced due to uncertainty in whether or not they had managed to practice FAIR research data management correctly, indicates that a better infrastructure could contribute something good. It might have made a difference in terms of reduced anxiety for these informants if there were broad access to “FAIR-aware data stewards” (Jacobsen et al., 2020) and/or if they would get help with establishing a blueprint to issue from in their research data management practices. However, again, we would argue that the problems that our material help to highlight are more fundamental. We would argue that the concept of data is at the heart of this whole issue, and specifically the failure to recognize that FAIR is based on one of many possible conceptualizations of data. Within the FAIR framework the concept of data currently constitutes a kind of epistemological assumption that remains largely unquestioned. Once again, it is worth pointing out that FAIR was born in the life sciences (Rocca-Serra et al., 2023) and is hence built on a positivist worldview as is dominant in the natural sciences. As obvious as it may seem, it is also worth noting that research data management as a practice has its origins in data science, and that this comes with a certain approach to and epistemological perspective on data. A quote taken from the GO FAIR Initiative's (GO FAIR, 2016) web page serves to illustrate this:

The principles emphasize machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data.

It is quite evident from the quote above what type of scientific approach FAIR is primarily designed for: the type of research work in which large amounts of data is analyzed using quantitative methods. Our interview material clearly describes the lack of support available for those informants who sought it – and it is no coincidence that it was precisely those informants work with mixed or qualitative data and use qualitative methods. In our study, these informants came from the social sciences, but a larger study would be needed to pinpoint more precisely how the scientific domain, community and preferred scientific methods interplay with the desire to implement the FAIR principles.

We can think of data – as a concept as well as our own data – in many ways. How one views the extraction of knowledge from data is related to our epistemological approach to data (Hoeyer, 2023; Stevens, 2021). The European Commission (2018) states that one of the primary benefits of FAIR research data management is that it can help researchers scale up their research findings by integrating with existing data and allow them to spend time on interpreting existing data rather than collecting new data which reproduces existing data. In a way, this perspective on data resembles the instrumental view of the relationship between data and knowledge that prevails in some parts of the Big Data community, in which more data is equated with more knowledge (boyd and Crawford, 2012; Stevens et al., 2018). While the FAIR framework suggests a strong link between data and knowledge there is also an emphasis that data must be properly structured, cleaned, standardized, be technically and semantically interoperable and so on – in other words the importance of “high quality data” be recognized (Wilkinson et al., 2016). Thus, it is not an instrumental (and what we would consider as relatively naïve) view of data that emerges with the FAIR framework, but rather one that is in line with what Stevens et al. (2018) refers to as the scientific discourse around (Big) data. Albeit, it is important to acknowledge that it is a particular scientific perspective that emerges; one that can be said to be based in a positivist worldview.

The fact that this understanding of the concept of data appears to be taken for granted within the FAIR framework merely consolidates the status of positivism in science, which admittedly should not be regarded as a new insight. Another way of relating to the concept of data can be illustrated by imagining a researcher asking themself the question: does my data represent knowledge per se for someone else, or is it my familiarity, perhaps even my personal relationship with the data in combination with the analytical methods I apply to interpret it and the theoretical concept I put in discussion with it that constitutes the actual knowledge? With this epistemological perspective on data, knowledge is not something factual but rather a certain interpretation that a researcher (sometimes together with others) has made of the data.² If another researcher would have access to the same “raw data,” the outcome, the results, the insights presented would certainly not be identical. This represents yet another, quite particular perspective on data – one out of many potential perspectives.³ One of the things that our study has helped to highlight is that understanding and relating to the concept of data must necessarily be pluralistic. We would argue that if FAIR is to have any chance of succeeding in its ambitions to be inclusive and all-encompassing, these insights need to be taken into account. Potential pathways could explore the inclusion of the description of the research context into (meta)data, in a machine-actionable way, as well as a move to knowledge graph-based representations with their additional linking capabilities. A technical mechanism to describe context automatically could off-load the individual researcher from overly administrative steps, like for instance the thinking in experience capturing in Human Computer Interaction research (e.g., Church et al., 2014).

We believe that these insights speak to the discussion about researchers’ motivation to make their data FAIR, which is highlighted as a problem that needs to be solved in order to achieve full-scale adherence (David et al., 2020; Rocca-Serra et al., 2023). Making your data machine-actionable requires not only sufficient resources in terms of finances and human labor, but there must also be other incentives in place – a sense that it will somehow benefit others’ and your own research to do so. For those researchers who have a hard time seeing how their data could be used as a source of knowledge by another, external party, a certain lack of motivation is understandable – even if you are told that making your data FAIR would serve the greater good of the scientific community (Wilkinson et al., 2016). There is an ethical aspect to this, where a more contextual understanding of data may hinder the willingness to make the data findable or accessible (Bezuidenhout, 2013), as a scientist will have doubts about the intentions and interpretability outside its context. Maybe machine-actionability is not equally suitable for all types of research/epistemological perspectives on data. However, our material indicates that even those researchers who easily see how their own work could benefit from increased access to others’ data seem to fall short in their actual practices that strive to make the data FAIR. Of course, we should see the proclaimed obstacles with person-sensitive data raised by our informants, as well as by previous research, as a real problem in relation to this (Inau et al., 2023; Löbe et al., 2020). But according to the European Commission (2018) FAIR does not necessarily imply open data, and data can be shared under restrictions and still be FAIR. This points to yet another form of discrepancy between the ideal of FAIR and the realization of its operational requirements; where researchers claim that FAIR and person-sensitive data is incompatible, but the European Commission (2018) seems to be of a different opinion. Based on our material, we can only speculate about what this discrepancy consists of. Is it is a matter of different interpretations of the principles, or perhaps rather a question of whether theory matches real-life practices?, This is a topic that deserves further scientific attention. However, as Hoeyer (2023: 18; in reference to Winthereik, 2010), points out, the integration of data into an infrastructure “…turns the very same data entries into elements of very different projects directed towards different objectives.” This statement (Hoeyer, 2023: 18) directs our attention toward another aspect that we ought to reflect further on in relation to FAIR and its challenges; not only can the concept of data have different epistemic meanings, but the same data can also be said to represent different things, and be used for different purposes, at the same time. In reference to Mol (2002), Hoeyer (2023: 216) refers to this phenomenon as the ontological multiplicity of data. This brings up the fundamental issue of the value of data separated from its context, which has different viewpoints depending on the type of science one belongs to. Previous research has suggested that researchers might be hesitant toward sharing data because they are afraid of how it might be used (or rather miss-used) out of its original context (Bezuidenhout, 2013; Birkbeck et al., 2022; Wise et al., 2019). Generally speaking, the natural sciences have fewer issues with this than the social sciences, but it is also problematic for the medical and engineering sciences and within each of these sciences there exist communities that do not subscribe to a positivist perspective. It might be a good idea to question what purpose, or rather whose interests, are being served by researchers becoming more or less forced to (attempt to) standardize and make their data available through FAIR. The costs, both for the researcher and for the research system, of making all project data FAIR needs to be balanced against the chances of data actually being reused. With that said, we would like to point out that the research community is not the only stakeholder in our study, but that the European Union constitutes another, and a very powerful one at that. Considering a policy of different levels of FAIR / RDM could provide a means to a more equal adoption of FAIR.

We are not the first to point out that building collaborative infrastructures for data sharing and reuse requires a deep understanding of how different scientific domains work. This was highlighted already over a decade ago within the field of Computer-supported cooperative work (CSCW) (Jirotka et al., 2013). However, the kind of cross-disciplinary collaboration and the type of technologies and infrastructures that were relevant then have today grown into something even more complex, with the emphasis on machine-actionability being particularly telling; people (and the collaboration between them) are still in the loop, but it is data and technology itself that many would regard as the forces which ought to drive science forward. With that said, a final important insight that we believe our study has helped to highlight is that people, with their enduring cultures and practices, and the enormous variety of perspectives on their environment (including a concept such as data) still play a key role in whether technological developments and the information infrastructures that accompany them work as intended (Bowker and Star, 2000).

Conclusion

This study investigated researchers’ experiences of practicing FAIR research data management within the context of a multi-stakeholder project funded by the European Commission under the Horizon2020 program, and whose work resides within the field of health research. Our analysis of the interview material showed that the informants’ experiences of practicing FAIR research data management differed largely depending on which scientific domain they belonged to. While some informants were used to issuing from a holistic plan (blueprint) the entire life cycle of the data in their daily work, this way of working with research data was something new for other informants. When this latter group of informants sought support from a data management office (or similar support functions) at their universities, they discovered that these support services lacked sufficient knowledge around how to handle the specific kind of data they were working with. As a result, they were forced to make their own, independent interpretations of the principles (often in consultation with their colleagues).

That the conditions for practicing research data management in accordance with FAIR are different for researchers working in different scientific domains has been highlighted in previous studies (Adams et al., 2023; David et al., 2020; European Commission, 2018; Jacobsen et al., 2020). We argue that our study adds further depth and nuance to these insights by highlight what those discipline-specific challenges are, what researchers do to try to overcome these challenges, how come these challenges occur and what consequences this can have for individual researchers. These findings could be interpreted as an indication that the FAIR infrastructure needs to be improved. Albeit, we believe that they point to a more fundamental issue than the (lack of) quality and extent of the current FAIR infrastructure; namely that the FAIR framework is built around the assumption that all researchers working with it share the same (epistemological) perspective on data. When funding bodies and publishers demand that researchers adhere to FAIR in their research data management practices, regardless of which scientific domain they belong to, our study indicates that problems can arise, particularly for interdisciplinary projects. However, our material also indicates that those informants whose (epistemological) perspective on data is more in line with the perspective on data which the FAIR framework is built around still seem to fall short in their actual practices that strive to make the data FAIR. As previous research has pointed out, this may have to do with perennial cultures, were a shift in attitude from “my data” to “our data” is deemed necessary (Wise et al., 2019). Based on the material collected for this study, we are, as mentioned, only able to speculate. This is something we plan to investigate further, and we encourage others to follow.

Limitations

The study investigated one EU funded Research and Innovation Action project with a focus on health and interviewed 9 persons. While this might be a limitation and gives some added complexity due to the sensitive nature of the data, we also believe that this setting provided the ideal grounds to start our study and to provide an indication. The project was typical and representative of EU RIA projects, constituting multiple stakeholders across countries, and working on different kinds of data. The health domain is traditionally already used to research data management methods. Something that should be investigated further to be able to draw more solid conclusions is whether the issues we found are representative of other science domains and whether they hold true for large-scale data infrastructure projects, like the EU DIGITAL program initiatives. Another dimension to explore further is the variation in epistemological position between researchers in these EU collaborative projects. Is the heterogeneity a problem in itself or a strength when it comes to FAIR? Our study cannot be conclusive to this end.

Footnotes

Acknowledgments

We thank the researchers who were willing to share their experiences and perspectives with us for their time and commitment. We would also like to thank the anonymous peer reviewers for their insightful and helpful feedback.

ORCID iDs

Mikaela Hellstrand

Jayanth Raghothama

Sebastiaan Meijer

Ethical approval and informed consent statements

Ethical approval was not required for this study. Informed consent for us to use the interview material in an anonymized format, including featured in a submitted manuscript to an academic journal, was obtained verbally (video recorded) from all participants. Any information that could potentially lead to the disclosure of the identities of the participants has been excluded from the article. Any information that could potentially identify the project in question has also been excluded.

Consent to participate

Informed consent for us to use the interview material in an anonymized format, including featured in a submitted manuscript to an academic journal, was obtained verbally (video recorded) from all participants. Any information that could potentially lead to the disclosure of the identities of the participants has been excluded from the article. Any information that could potentially identify the project in question has also been excluded.

Author contributions

MH and JR both contributed to the study design. MH conducted the interviews and the initial analysis of the material, while JR and SM contributed to the subsequent analysis of the material. MH drafted the manuscript, and JR and SM provided feedback. SM handled the revisions. All authors approved the final version of the manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The transcriptions of the interviews are stored on a secure server at KTH Royal Institute of Technology. Due to ethical concerns, the material is not publicly available. An anonymized version of the interview transcripts is available upon reasonable request from the corresponding author.

Notes

References

Adams

Jones

Foster

(2023) Supporting FAIR data management planning across different disciplines at the University of Sheffield. Data Science Journal 22(1): 17.

Alcoff

(2022) Extractivist epistemologies. Tapuya 5(1): 2127231.

Alvarez-Romero

Martínez-García

Sinaci

, et al. (2022) FAIR4Health: Findable, accessible, interoperable and reusable data to foster health research. Open Research Europe 2: 34.

Behar

(1996) The Vulnerable Observer: Anthropology that Breaks Your Heart. Boston: Beacon Press.

Beyan

Choudhury

Van Soest

, et al. (2020) Distributed analytics on sensitive medical data: The personal health train. Data Intelligence 2(1–2): 96–107.

Bezuidenhout

(2013) Data sharing and dual-use issues. Science and Engineering Ethics 19: 83–92.

Birkbeck

Nagle

Sammon

(2022) Challenges in research data management practices: A literature analysis. Journal of Decision Systems 31(sup1): 153–167.

Bloemers

Montesanti

(2020) The FAIR funding model: Providing a framework for research funders to drive the transition toward FAIR data management and stewardship practices. Data Intelligence 2(1–2): 171–180.

Boeckhout

Zielhuis

Bredenoord

(2018) The FAIR guiding principles for data stewardship: Fair enough? European Journal of Human Genetics 26: 931–936.

10.

Bowker

Star

(2000) Sorting Things Out: Classification and Its Consequences. Cambridge, MA, USA: MIT Press.

11.

boyd

Crawford

(2012) Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5): 662–679.

12.

Braun

Clarke

(2006) Using thematic analysis in psychology. Qualitative Research in Psychology 3(2): 77–101.

13.

Church

Cherubini

Oliver

(2014) A large-scale study of daily information needs captured in situ. ACM Transactions on Computer-Human Interaction 21(2): 1–46.

14.

David

Mabile

Specht

, et al. (2020) FAIRness literacy: The Achilles’ heel of applying FAIR principles. Data Science Journal 19(1): 32.

15.

European Commission, Directorate-General for Research and Innovation (2018) Turning FAIR into reality : final report and action plan from the European Commission expert group on FAIR data. Publications Office. Available at: https://data.europa.eu/doi/10.2777/1524 (accessed 17 June 2024).

16.

European Commission, Horizon2020 Programme (2016) Guidelines on FAIR Data Management in Horizon 2020. Version 3.0. Available at: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf (accessed 17 June 2024).

17.

Everett

(2024) Data governance vs data management: What’s the difference?. In: Informatica. Available at: https://www.informatica.com/blogs/data-governance-vs-data-management-whats-the-difference.html (accessed 17 June 2024).

18.

Feldman

Shaw

(2019) The epistemological and ethical challenges of archiving and sharing qualitative data. American Behavioral Scientist 63(6): 699–721.

19.

GO FAIR (2016) GO FAIR initiative. Available at: https://www.go-fair.org/go-fair-initiative/ (accessed 17 June 2024).

20.

Hicks

(2023) Open science, the replication crisis, and environmental public health. Accountability in Research 30(1): 34–62.

21.

Hoeyer

(2023) Data Paradoxes: The Politics of Intensified Data Sourcing in Healthcare, 1st edn. Cambridge: MIT Press.

22.

Hoeyer

Green

Martani

, et al. (2024) Health in data space: Formative and experiential dimensions of cross-border health data sharing. Big Data & Society 11(1): 1–14.

23.

Hughes

Tsueng

DiGiovanna

, et al. (2023) Addressing barriers in FAIR data practices for biomedical data. Scientific Data 10: 98.

24.

Inau

Sack

Waltemath

, et al. (2021) Initiatives, concepts, and implementation practices of FAIR (findable, accessible, interoperable, and reusable) data principles in health data stewardship practice: Protocol for a scoping review. JMIR research Protocols 10(2): e22505.

25.

Inau

Sack

Waltemath

, et al. (2023) Initiatives, concepts, and implementation practices of the findable, accessible, interoperable, and reusable data principles in health data stewardship: Scoping review. Journal of Medical Internet Research 25: e45013.

26.

Jacobsen

de Miranda Azevedo

Juty

, et al. (2020) FAIR Principles: Interpretations and implementation considerations. Data Intelligence 2(1–2): 10–29.

27.

Jirotka

Lee

Olson

(2013) Supporting scientific collaboration: Methods, tools and concepts. Computer Supported Cooperative Work (CSCW) 22: 667–715.

28.

Landström

(2024) On epistemic extractivism and the ethics of data-sharing. Philosophy of the Social Sciences 54(5): 387–411.

29.

Lefebvre

Spruit

(2019) A socio-technical perspective on reproducibility in research data management. In: MCIS 2019 Proceedings. Vol. 10. Available at: https://aisel.aisnet.org/mcis2019/10

30.

Löbe

Matthies

Stäubert

, et al. (2020) Problems in FAIRifying medical datasets. Studies in Health Technology and Informatics 270: 392–396.

31.

Manola

Mutschke

Scherp

, et al. (2020) Implementing FAIR data infrastructures (Dagstuhl perspectives workshop 18472). Dagstuhl Manifestos 8(1): 1–34.

32.

Mol

(2002) The Body Multiple: Ontology in Medical Practice. Durham, NC, USA: Duke University Press.

33.

Mons

(2020) Invest 5% of research funds in ensuring data are reusable. Nature 578(7796): 491–491.

34.

Mons

Schultes

Liu

, et al. (2020) The FAIR principles: First generation implementation choices and challenges. Data Intelligence 2(1–2): 1–9.

35.

Nicholson

Kansa

Gupta

, et al. (2023) Will it ever be FAIR? Advances in Archaeological Practice 11(1): 63–75.

36.

Perrier

Blondal

MacDonald

(2020) The views, perspectives, and experiences of academic researchers with data sharing and reuse: A meta-synthesis. PLoS One 5(2): e0229182.

37.

Plomp

Dintzner

Teperek

, et al. (2019) Cultural obstacles to research data management and sharing at TU Delft. Insights the UKSG Journal 32(1): 29.

38.

Queralt-Rosinach

Kaliyaperumal

Bernabé

, et al. (2022) Applying the FAIR principles to data in a hospital: Challenges and opportunities in a pandemic. Journal of Biomedical Semantics 13(12): 1–19.

39.

Rehnert

Takors

(2023) FAIR Research data management as community approach in bioengineering. Engineering in Life Sciences 23(1): e2200005.

40.

Rocca-Serra

Ioannidis

, et al. (2023) The FAIR Cookbook – The essential resource for and by FAIR doers. Scientific Data 10(1): 292.

41.

Stevens

(2021) Dreaming with data: Assembling responsible knowledge practices in data-driven healthcare. PhD Thesis. Erasmus University Rotterdam, the Netherlands.

42.

Stevens

Wehrens

de Bont

(2018) Conceptualizations of Big Data and their epistemological claims in healthcare: A discourse analysis. Big Data & Society 5(2): 1–21.

43.

Tacconelli

Gorska

Carrara

, et al. (2022) Challenges of data sharing in European COVID-19 projects: A learning opportunity for advancing pandemic preparedness and response. The Lancet Regional Health – Europe 21: 100467.

44.

Tanhua

Pouliquen

Hausman

, et al. (2019) Ocean FAIR data services. Frontiers in Marine Science 6: 440.

45.

Tenopir

Allard

Douglass

, et al. (2011) Data sharing by scientists: Practices and perceptions. PLoS ONE 6(6): e21101.

46.

Thompson

Burger

Kaliyaperumal

, et al. (2020) Making FAIR easy with FAIR tools: From creolization to convergence. Data Intelligence 2: 87–95.

47.

Tóth-Czifra

(2020) The risk of losing the thick description: Data management challenges faced by the arts and humanities in the evolving FAIR data ecosystem. In: Edmond

(ed) Digital Technology and the Practices of Humanities Research. Cambridge, UK: Open Book Publishers, 235–266.

48.

Wilkinson

Dumontier

Aalbersberg

, et al. (2016) The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3(1): 1–9.

49.

Winthereik

(2010) The project multiple: Enactments of systems development. Scandinavian Journal of Information Systems 22(2): 1–16.

50.

Wise

de Barron

Splendiani

, et al. (2019) Implementation and relevance of FAIR data principles in biopharmaceutical R&D. Drug Discovery Today 24(4): 933–938.