Abstract
There is a growing movement for research data to be accessed, used, and shared by multiple stakeholders for various purposes. The changing technological landscape makes it possible to digitally store data, creating opportunity to both share and reuse data anywhere in the world for later use. This movement is growing rapidly and becoming widely accepted as publicly funded agencies are mandating that researchers open their research data for sharing and reuse. While there are numerous advantages to use of open data, such as facilitating accountability and transparency, not all data are created equally. Accordingly, reusing data in qualitative research present some epistemological, methodological, legal, and ethical issues that must be addressed in the movement toward open data. We examine some of these challenges and make a case that some qualitative research data should not be reused in secondary analysis.
Driven by the open data agenda, more and more researchers are being urged by funders and editors to make their data available and accessible to the wider research community (Childs, McLeod, Lomas, & Cook, 2014; Corti, Eynden, Bishop, & Woolard, 2014; Parry & Mauthner, 2004; Science International, 2015). Open data enable researchers and other stakeholders to access, use, modify, and share research data (Open Knowledge International, n.d.) across disciplines for secondary analysis. Open access to data offers opportunities for collaboration, innovation, and economic potential. With quantitative research, providing open access to data has traditionally been encouraged to enable verification of analyses and challenge research findings by the academic, research, and government communities (Mauthner, Parry, & Backett-Milburn, 1998). There have, however, been some mixed reactions among qualitative researchers worldwide regarding whether data should be made open and accessible for sharing and reuse. This has ignited a contentious debate about the appropriateness of secondary analysis of qualitative data, and in particular, open data, as some researchers believe this could have significant implications for the quality of the analysis and interpretation (Irwin, 2013; Mauthner et al., 1998), specifically related to how data are generated under particular contextual conditions. In this article, we argue that not all qualitative research data are appropriate for open access, as epistemological, methodological, legal, and ethical issues may become concerns. We first explore the meaning of open data and its advantages and then discuss some concerns relating to the quality and rigor of analysis of open data in relation to qualitative research.
Open Data
There is a growing, global trend to open data for anyone to freely access, reuse, or share it (Open Data Handbook, 2013). In the context of research, this means that collected and retained data can later be shared and reused if ethical approval is in place. Open data mean access to original data sets, not only published findings or results that would typically be included in systematic reviews or meta-analyses. The impetus for open data was led by the Organization for Economic Cooperation and Development (OECD, 2007) in the United Kingdom in the mid-1990s. OECD recommended sharing of and open access to publicly funded research data within and among research communities. Now, several OECD member countries, including Canada, have also committed to these principles. In 1994, the Economic and Social Research Council in the UK established Qualidata, now part of U.K. Data Service, the world’s first archive of qualitative data, making data available for reuse in social science (Corti et al., 2014) and later the Qualitative Data Repository in the United States and the Qualitative Archive in Australia. In addition, advancements in information and communication technologies make it possible to digitize, archive, and facilitate access to large data sets (Corti, Fielding, & Bishop, 2016; Heaton, 2008). For example, the U.K. Data Service has archived over 1,000 qualitative and mixed-methods data sets (Bishop & Kuula-Luumi, 2017).
In Canada, where we live and work, the Tri-Council Agencies, which include the Canadian Institutes of Health Research, the Natural Sciences and Engineering Research Council of Canada, and the Social Sciences and Humanities Research Council of Canada, are major sources of research funding and they advocate for open data, as evidenced in the following digital data management statement: As publicly funded organizations, the agencies are strong advocates for making the results of the research they fund as accessible as possible. In promoting access to research results, they aspire to advance knowledge, avoid research duplication and encourage reuse, maximize research benefits to Canadians and showcase the accomplishments of Canadian researcher…. The ability to store, access, reuse and build upon digital research data has become critical to the advancement of science and scholarship, supports innovative solutions to economic and social challenges, and holds tremendous potential for Canada’s productivity, competitiveness and quality of life. (Government of Canada, 2016a, para 2).
In relation to qualitative research, the OECD (2007) posits that open data have numerous advantages such as reinforcing open scientific inquiry, promoting new research, and encouraging diversity in analysis and opinion. It also increases transparency (Corti et al., 2016) and accountability and leads to more efficient and economical use of data (Childs et al., 2014; Corti et al., 2016; Science International, 2015), particularly for research that is publicly funded. Furthermore, it minimizes the burden of participants by encouraging the reuse of data (Government of Canada, 2006a; Law, 2005). Finally, open data enable other researchers to build upon existing or original data, critique the existing data analysis (Bishop & Kuula-Luumi, 2017; Mauthner et al., 1998), and test or refute new theories by examining or validating research findings (Coltart, Henwood, & Shirani, 2013; Hammersley, 1997; Thorne, 1998). All of this can encourage robust dialogue.
This movement toward open data is noteworthy because we will see more demand by funding agencies for researchers worldwide to make their data open and accessible to share and reuse. Nonetheless, as data itself become more available to researchers, consideration must be given to contentious issues around open data and the need for researchers to assert that not all data are created equally. Unlike quantitative research, qualitative research is more difficult to adequately contextualize. Further, not all qualitative data are equally useful when decontextualized.
Implications of Open Data in Qualitative Research
Researchers have identified several epistemological (Childs et al., 2014), methodological, legal, and ethical (Childs et al., 2014; Parry & Mauthner, 2004) implications with regard to open data in qualitative research. These issues encompass controversy regarding secondary analysis, which involves the reuse of existing qualitative data from previous research studies (Irwin & Winterton, 2011) for transparency (Corti & Fielding, 2016), to validate results and/or to generate new findings from old data (Irwin & Winterton, 2011). Qualitative secondary analysis already occurs among qualitative researchers and graduate students but often takes place in teams or between collaborators where “insider” knowledge of the research can be shared. As the opportunity to access data for reuse becomes more available, secondary data analysis without this “insider” knowledge will become more common, even becoming mainstream (Bishop & Kuula-Luumi, 2017). As such, the epistemological, methodological, legal, and ethical ramifications must be addressed to inform the growing trend of qualitative open data.
Secondary Analysis: Epistemological Issues
One of the contentious issues surrounding open data stems from the uniqueness of qualitative inquiry, and whether data can be appropriately reused outside of its original context (Childs et al., 2014; Irwin, 2013). Qualitative data capture the lived experiences of participants through words, images, or behaviors (Merriam & Tisdell, 2016), arising from observations, interviews, documents, or artifacts. The very nature of how knowledge is created in qualitative research and the conditions under how the data are created provide researchers with a privileged “insider view” of the phenomena (Creswell & Poth, 2018). The researcher studies the lived experience of the individual and focus on capturing and interpreting the human phenomena that are unique to their context at one moment in time. Considering the situational context of the individual is imperative (Munhall, 2012) in qualitative research, as the context is situationally constrained by historical, cultural, social, and political influences (Coltart et al., 2013), which are specific to that place and time and cannot be reproduced. These contextual layers are experienced through the relationships among the researchers and participants, resulting in the co-construction of knowledge. Furthermore, the data are subjectively created (Irwin & Winterton, 2011) influenced by participant experiences, perceptions, values, and beliefs, as well as the researcher’s own subjective experiences resulting in data that are value laden. It is impossible for qualitative data to be bias free. Qualitative data are inextricably linked to the context in which it was obtained and removing this contextual information will significantly affect the interpretation of the data, disconnecting it from its true meaning, potentially rendering it unusable. Since there are currently no mechanisms in place to monitor such use of the data, there are no assurances of quality and meaning of the data.
When Mauthner, Parry, and Backett-Milburn (1998) revisited their own data in secondary analysis, they found themselves less intellectually engaged and less emotionally attached to the data. The researchers were also unable to fully recapture the context from their previous research. Furthermore, answers to questions could not be found as the specific research questions posed in the secondary analysis were not raised in the original research. They concluded that “the data was bounded by condition and contexts under which they are collected” (p. 743). Significantly, they noted this in the secondary analysis of their own research where they had firsthand knowledge of the participants and context of the study. When qualitative data are obtained through open data access, outside researchers will not have this advantage, and contextual knowledge will not be provided. It is this latter gap that raises significant epistemological challenges in use of open data. Moreover, the subjective nature of the construction of data is perceived differently between researchers and is further influenced by the researcher’s ontological and epistemological perspectives (Mauthner et al., 1998). Therefore, different researchers will approach the same data from different positions or perspectives.
Methodological Issues
The nature of specific qualitative research methods used to generate data has significant implications for open data. Three key issues relate to the type of research design, the use of field notes, and reflexivity in qualitative research. First, some qualitative research designs are not conducive to secondary analysis. For example, data from interpretive phenomenological studies are not appropriate for reuse. This research approach is primarily concerned with the lived experience of the participants (Burns, Groves, & Gray, 2015). Interpretive phenomenologists believe their preconceptions should not be removed during data analysis and their personal knowledge is necessary for phenomenological research (Geanellos, 2000). Therefore, the researcher becomes part of the research and may bias the data. Furthermore, the lived experiences of the participants are temporal, linked to the social, cultural, and political contexts of their lives. Again, the nature and characteristics of the context-dependent knowledge may not be apparent when data are reused (Thorne, 1998).
Open data are also problematic in participatory research as data are not captured solely in transcripts. Participatory research methods involve a collaborative approach in which the researcher works with the participants in planning and conducting the research (Bergold & Thomas, 2012) and they become active contributors to the research process. As participants engage in the research process, not all the data are captured in the transcripts. Moreover, some researchers question whether data can be reused without the involvement of the participants as they are intrinsically linked (Childs et al., 2014).
Second, field notes are written by the researcher, for their own use. Some researchers have reported that they have found it very difficult to work from other researchers’ field notes without going to the field itself (Bonds, 1990), as the original researcher is interpreting their own observations and recording the field notes from the environment in which the research took place. Even if field notes were shared, using other researchers’ field notes could lead to misunderstanding or misinterpretation of local phenomena.
For example, ethnographic researchers immerse themselves in the setting for a prolonged time period, observing and conducting the occasional interview (Burns et al., 2015). Field notes form an intrinsic part of ethnographic inquiries and are often shorthand statements of what the researcher thinks and feels about what they observed in the backdrop of the naturalistic setting. In addition, while the researcher captures both verbal and nonverbal communication in their field notes, there are more nuanced understandings between researcher and participant that may not be captured in the field notes (Childs et al., 2014; Thorne, 1998). There must also be an understanding that not all of what the researcher observes, feels, and hears can be written down. Consequently, not all data from the field are available for reinterpretation in open access data, significantly hindering potential reanalysis or reuse.
Third, qualitative researchers are also encouraged to use reflexivity, characterized by ongoing self-critique and self-appraisal (Creswell & Poth, 2018) in which the researcher explores personal feelings and experiences (Burns et al., 2015) through an iterative process. Reflexive practice establishes rigor, or trustworthiness, in qualitative research (Creswell & Poth, 2018). Furthermore, meaning emerges from reflexivity created within the research process (Hammersley, 2010) and attends to a specific contextual knowledge (Irwin, 2013; Silva, 2007). Since meaning emerges from the reflexive practices of the researcher directly involved in the context of data production, reuse of data outside of the original context is problematic (Irwin, 2013; Silva, 2007).
Ethical and Legal Issues
The debate on open data is further polarized when ethical and legal implications surrounding potential harm to participants are considered. Specifically, ethical and legal concerns relating to informed consent as well as protection of privacy and confidentiality support the position that some qualitative data should not be open for sharing and reuse.
First, informed consent is positioned within the principles of beneficence and nonmaleficence, which are protected by laws and professional associations that help govern professional behavior (Canadian Nurses Association, 2008). Canadians’ rights to privacy are protected under the Canadian Constitution and federal and provincial/territorial privacy legislation. Researchers are required to comply with these legal requirements to obtain consent for disclosing information about the participants (Government of Canada, 2016a). Furthermore, institutional research ethics boards (REB) are in place to assure that appropriate steps are taken to protect the rights to privacy and confidentiality of participants in a research study. It is also a professional responsibility of researchers who are guided by the research councils’ policies to practice ethically (Government of Canada, 2016b). Ultimately, however, it is the researcher who is responsible for protecting the participants (Government of Canada, 2016b). Consequently, there are legal and ethical concerns as to whether the sharing and reuse of data can be done while complying with legislation, the REB, and professional associations.
Heaton (2008) addresses the difficulties involved in obtaining informed consent from participants for sharing their narrative data with others and then reusing it for purposes other than those for which it was originally intended for and collected. In general, participants volunteer to share their experiences for one specific research question, and reuse of these data for a different research question will infringe on the conditions under which consent was obtained in the first place unless further consent is obtained for additional analyses (Thorne, 1998). It is impossible to guarantee participants about how their data might be reused and recontextualized in the future.
Some researchers have suggested collecting a “blanket consent” from participants in order that their data be kept indefinitely and potentially reused by anyone for any purpose (Childs et al., 2014; Heaton, 2008). However, it is difficult to predict what future use of data can be expected. The new purpose may conflict with the values or wishes of the research participants. In addition, it may not comply with informed consent provisions in REB requirements.
Second, there are ethical issues relating to confidentiality and anonymity of the participants with open data. Boruch, Reiss, Garner, Larntz, and Feels (1991) assert that REBs regard the sharing of data among researchers as a potential threat to the privacy and anonymity of the participant. Although pseudonyms are used to protect the participant’s identity, there is still a possibility they could be identified by the details of their experiences in qualitative research. Inadvertent identification of participants could result in the release of data that could be physically, psychologically, or socially harmful to the participants’ well-being when their identities are no inadvertently revealed (Burns et al., 2015). For example, when the National Institutes of Health opened deidentified genomic data, researchers in one study were able to reidentify participants through genealogical triangulation, cross-referring genomics with demographic data. Clearly, this was a threat to confidentiality and privacy from genetic data obtained from a genetic database (Gymrek, McGuire, Golan, Halperin, & Erlich, 2013). While this example relates to genomic data, data obtained in qualitative research could potentially identify participants when various characteristics and experiences are included together.
Furthermore, given the small sample sizes typically used in qualitative research, as well as the very nature of qualitative research questions, participants are often asked to disclose information about sensitive issues that could potentially be harmful if their identity was disclosed. It is therefore important that privacy and confidentiality be preserved. Protecting the identity of participants, and ensuring their anonymity can become problematic, particularly in small and rural communities where the identity of the participants can be reconstructed (Parry & Mauthner, 2004). Law (2005) further supports this claim by acknowledging that combining census data and geographical data can identify small and unique groups. Consequently, there is a possibility that linking data can identify participants. While key characteristics can be removed to provide anonymity (Corti, 2000; Lin, 2009), this may lead to stripping of information to the point that the qualitative data become insubstantial (Childs et al., 2014; Parry & Mauthner, 2004).
Moving Forward
Researchers, funding agencies, academic institutions, publishers, and other stakeholders need to discuss these methodological and ethical issues and work collaboratively to develop data policies that address these concerns while preserving the quality of qualitative data. Data management and data sharing policies have been implemented in the United States and in several European countries such as the United Kingdom and Finland. Canada does not currently mandate data management practices and sharing, but the Tri-council federal granting agencies are currently consulting with researchers on a data management policy which will be introduced in late 2019.
Recognizing that there are different interests in data management and data sharing practices, each of these stakeholders plays a critical role in the stewardship of data while government agencies that fund research and researchers expect ethical, effective, and efficient use of data. Establishment of online data repositories that consider the uniqueness and context of qualitative research may facilitate increased appropriate data sharing.
Academic institutions must support researchers’ ethical and legal obligations of ensuring the security and protection of participants. Ideally, they would guide the researcher in making decisions regarding the level of access and the level of data processing depending on the unique nature and context of the data (Jones et al., 2018). In addition, academic institutions can provide researchers with the environment and resources for best practices on data stewardship.
The role of the researcher includes developing a data management and data sharing plan in accordance with ethical obligations and agencies’ policies and standards. Since rigorous analysis of qualitative data is more time consuming than for quantitative research, adequate time must be provided for analysis of thick, rich data before data are shared and other researchers reinterpret the data. Researchers using qualitative data repositories need to balance the complex issues relating to demand for access to data, level of processing of the data, and ethical and legal obligations to the participants. The Jones et al.’s (2018) framework can be useful to researchers in order to determine the level of access, from open to closed considering the nature of the data. Obviously, potentially identifiable information needs to be restricted to protect the privacy and confidentiality of the participants.
Publishers must ensure that proper credit is given to the original researchers as well as to authors conducting secondary analysis on open data (Jones et al., 2018). Requirements for data access must consider the uniqueness and context of the data in each qualitative study. Consideration should be given to policies that grant the original research team adequate opportunities for involvement in publication of secondary analyses, perhaps with the rights to authorship to future publications if circumstances warrant. Alternatively, opportunities to comment on the new analysis and interpretation, considering the investigators’ understanding of the unique context of the study, would provide some additional accountability.
Consultation and dialogue among these stakeholders are needed. Early steps have been taken by the Tri-Council Agencies in Canada. Policies that have been developed in other countries will be a useful resource. Realizing that there may not be a one-size-fits-all approach, and that in some situations, it may not be appropriate for qualitative data to be shared, there needs to be proper data stewardship which ensures that the fundamental principles of qualitative data are safeguarded. While all stakeholders must work collaboratively to address the issues, researchers must be central to this consultation. With the growing momentum to ensure open access to data, researchers are experiencing a shift of culture. We must recognize and advocate for data stewardship as all researchers are responsible for managing data for reuse while protecting the rights of the participants.
Conclusion
Qualitative research offers valuable opportunities to obtain in-depth information about phenomena of interest. There is a unique engagement between participant and researcher in generating the data, which is often rich and contextual. However, despite the growing movement toward providing open access to data precipitated by requirements of some funding bodies, it is not appropriate to share some qualitative data from transcripts, field, or reflective notes. This position is supported by epistemological, methodological, legal, and ethical principles. It is important that as researchers, we make informed decisions about which data should be open for sharing and consider implications of this on future research. Further, researchers need to engage with stakeholders in discussions of the risks of open access of qualitative research and in development of policy in this important area.
Footnotes
Authors’ Note
Amelia Chauvette is also affiliated to University of Alberta, Edmonton, Alberta, Canada. Kara Schick-Makaroff and Anita E. Molzahn are also affiliated to Faculty of Nursing, Edmonton Clinic Health Academy, University of Alberta, Edmonton, Alberta, Canada.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
