Big Qual: Defining and Debating Qualitative Inquiry for Large Data Sets

Abstract

Big qualitative data (Big Qual), or research involving large qualitative data sets, has introduced many newly evolving conventions that have begun to change the fundamental nature of some qualitative research. In this methodological essay, we first distinguish big data from big qual. We define big qual as data sets containing either primary or secondary qualitative data from at least 100 participants analyzed by teams of researchers, often funded by a government agency or private foundation, conducted either as a stand-alone project or in conjunction with a large quantitative study. We then present a broad debate about the extent to which big qual may be transforming some forms of qualitative inquiry. We present three questions, which examine the extent to which large qualitative data sets offer both constraints and opportunities for innovation related to funded research, sampling strategies, team-based analysis, and computer-assisted qualitative data analysis software (CAQDAS). The debate is framed by four related trends to which we attribute the rise of big qual: the rise of big quantitative data, the growing legitimacy of qualitative and mixed methods work in the research community, technological advances in CAQDAS, and the willingness of government and private foundations to fund large qualitative projects.

Keywords

focus groups methods in qualitative inquiry mixed methods case study observational research

The term “big qualitative data (big qual),” or research involving large qualitative data sets, was likely borrowed from the term “big data.” Big Data typically refers to the large quantitative data sets increasingly used by academic researchers, government and nonprofit agencies, the private sector, and nonacademic political researchers. Big quantitative data has been the subject of a vigorous public debate related to individual privacy rights and the appropriate analysis and interpretation of such data (e.g., Brooks, 2013; Ohm, 2012; Shah, Horne, & Capellá, 2012). The original conception of big data tended to assume that data constituted numbers not words and images (Diebold, 2003, 2012). After big quantitative data emerged, however, many newly introduced conventions for big qual were also developed that began to change the fundamental nature of some qualitative research. The purpose of this methodological essay is to distinguish big qual from big data and then to present a broad debate about the extent to which big qual may be transforming some forms of qualitative inquiry.

Before the term big qual emerged, one of the earliest examples of large-scale mixed methods research projects was the Framington Heart Study, a medical study begun in 1948 with 5,209 participants that has continued to the present day with several new cohorts of participants including many of the children and grandchildren of the original participants (Levy & Brink, 2005). Over 1,000 medical papers have been published from the Framington data (Mahmood, Levy, Vasan, & Wang, 2014). While the Framington study was primarily quantitative, interviews were also conducted to measure psychosocial health factors such as anxiety, depression, social support, and hostility (K. Davidson, MacGregor, Stuhr, Dixon, & MacLean, 2000). Many large-scale mixed methods medical studies followed the Framington Heart Study (Plano Clark, 2010) including the present-day Precision Medicine Initiative (PMI), a mixed methods medical study conducted by the National Institutes of Health (NIH), whose goal is to enroll 1 million participants to study the role of genetics and lifestyle in health outcomes (Collins & Varmus, 2015). Though the PMI is a mixed methods study, it is as yet unclear the degree to which the research design will incorporate both primary and secondary qualitative data as the project evolves.

While a comprehensive big qual literature review is beyond the scope of this methodological essay, our initial review of large-scale qualitative and mixed methods studies conducted in the last 5 years uncovered research in a broad range of disciplines including agriculture (e.g., Charatsari & Papadaki-Klavdianou, 2017), business (e.g., St-Hilaire, Gilbert, & Lefebvre, 2018), environmental protection (e.g., Lynn, 2017), health and medicine (e.g., Hurst et al., 2016; Jenkins, Slemon, Haines-Saah, & Oliffe, 2018; Mayberry, 2016), public safety (e.g., Kerrison, Cobbina, & Bender, 2018), sociology and anthropology (e.g., Knight, Cottrell, Pickering, Bohren, & Bright, 2017; Manning & Greenwood, 2018; Reed, Strzyzykowski, Chiaramonte, & Miller, 2018), and education (e.g., Brower et al., 2017; Rutledge, Cohen-Vogel, & Osborne-Lampkin, 2012; Calma, 2013; Eta, Kallo, & Rinne, 2018; LaPointe-McEwan, DeLuca, & Klinger, 2017). Our initial review also showed that fewer than half of big qual studies involved primary data collection in the field. Many secondary data big qual studies in our review involved data downloaded from social media sources such as Facebook or Twitter (e.g., Greene, Choudhry, Kilabuk, & Shrank, 2011), qualitative data drawn from open-ended comment box questions on quantitative surveys (e.g., Elsesser & Lever, 2011), consumer research conducted by the private sector or political research conducted by nonacademic researchers (e.g., Clow & James, 2010), and content analysis conducted through computerized text-mining techniques (e.g., Guest & MacQueen, 2008).

Big qual can also be aligned with “rapid qualitative inquiry” (Beebe, 2014) and “multi-sited ethnography” (Coleman & von Hellermann, 2011), though we include in our definition both “slow” and “rapid” methods and qualitative traditions beyond ethnography including, for instance, case study, narrative inquiry, and grounded theory. Our review revealed that the most common research tradition for big qual was the case study (e.g., Brower et al., 2017; Calma, 2013).

Based on our review, we define big qual as data sets containing either secondary qualitative data or primary data with at least 100 participants, analyzed by teams of researchers, often funded by a government agency or private foundation, and conducted either as a stand-alone project or in conjunction with a large quantitative study.¹

Saldaña (2013) observed that “a metacognition of method, even in an emergent, intuitive, inductive-oriented, and socially conscious enterprise such as qualitative inquiry, is vitally important” (p. 40). This methodological essay is intended to ask metacognitive questions about big qual by making issues explicit that have thus far remained largely implicit based on our review of the qualitative research methods literature. Not unlike case study research, we have found big qual designs to be very flexible in terms of how they can be combined with other qualitative traditions. Nonetheless, currently big qual is a collection of methods that lacks the rich philosophical history and broader application we find in qualitative traditions such as phenomenology or grounded theory. Therefore, we hope to initiate a debate about whether big qual might someday be grounded in a deeper philosophy. Before presenting our discussion, we first must acknowledge how our work with large qualitative data sets contextualizes the debate presented here.

Research Contexts

The research context for this methodological essay was an ongoing 5-year mixed methods research project of a major policy shift in the delivery of developmental education (DE or remediation) in the 28 state colleges in Florida (Brower, Bertrand Jones, Hu, & Park-Gaghan, in press; Brower et al., 2017; Mokher, Spencer, Park, & Hu, 2019; Nix, Bertrand Jones, Brower, & Hu, in press; Park-Gaghan et al., in press; Park, Woods, Hu, Bertrand Jones, & Tandberg, 2018; Woods, Hu, Bertrand Jones, & Tandberg, 2018, 2019). The qualitative research methods for the DE project were informed by the work of researchers from two previous K-12 projects. One project was part of a multiyear research and reform effort focused on identifying the combination of essential components and programs, practices, processes, and policies that make some high schools in large urban districts particularly effective with students from traditionally low-performing subgroups (Rutledge, Cohen-Vogel, & Osborne-Lampkin, 2012). The second explored the implementation of district programs used to train and certify school leaders in Florida’s 67 districts (Rutledge, Cohen-Vogel, Osborne-Lampkin, & Roberts, 2015). Our perspective on big qual is grounded in our work on these projects and supported by our individual and collective interrogation of our data collection and analysis processes. We present a summary of our process for the DE project, specifically, to provide context for our debate.

As a mixed methods research project, the qualitative research team on the DE project collaborated with the quantitative research team through frequent meetings and informal discussions. Research findings from the quantitative team informed questions that were asked in successive iterations of the qualitative interview protocols and qualitative findings informed questions asked on annual surveys administered by the quantitative research team.

Over the course of 5 years, the overarching qualitative research question shifted from policy implementation processes to promising institutional practices in community colleges to organizational change and transformation. Our qualitative sampling strategy was a maximum variation sample at both the institution level and the individual level, which involved “purposely picking a wide range of cases to get variation on dimensions of interest” (Patton, 2015, p. 267). The 21 institutions in our 5-year sample represented the majority of institutions in the Florida College System. Institutions were located in every region of the state and differed by enrollment size, location (i.e., rural, suburban, and urban), and average performance on student outcome measures from the quantitative data set. At the individual level, we sampled different types of campus personnel including presidents, administrators, faculty, advisors, and support staff as well as students reflecting the diversity of community college student populations (i.e., students of color, veterans, English-language learners, immigrant students, undocumented students, parents and working adults, students with disabilities, first-generation college students, economically disadvantaged students, LGBTQ students, homeless students, and formerly incarcerated students). We also conducted interviews with state legislators and external policy stakeholders.

Our data for the DE project consisted of field notes, verbatim transcripts from focus groups and individual interviews, and institutional documents collected on site visits to 21 state colleges in Florida (with some repeat visits) over a 5-year period. To date, we have conducted 42 site visits and 166 focus groups comprising over 1,100 total participants.

Given the volume of data, the project consisted of a team of researchers (ranging from five to six) who conducted data analysis. The research team met weekly to share findings and discuss the analysis process. Our data analysis process employed pattern coding to identify central concepts and properties in the data (Corbin & Strauss, 2008; Miles, Huberman, & Saldaña, 2014). To begin the process, one researcher read the verbatim transcripts to ascertain where the subject of the participants’ response had changed and a new paragraph should begin. This process was, in effect, a form of precoding. We then read through the field notes, institutional documents, and focus group data to synopsize the chronology of institutional processes.

To establish inter-coder reliability, a number of specific analytic processes were necessary. Each of the codes in our coding frameworks needed consistent definitions in a codebook that researchers could refer to frequently. Our codebook de-emphasized codes reflecting highly theoretical or abstract concepts due to the likelihood that codes would be interpreted differently by different researchers. After we coded a subset of the data, we ran the Cohen’s κ coefficient function in NVivo, a computer-assisted qualitative data analysis software (CAQDAS) program. In the first round of reliability testing, we evaluated two pairs of researchers, comparing the coding of an individual researcher with the coding of another individual researcher. In the second round of reliability testing, we followed the same process by comparing two sets of researchers.

We then initiated pattern coding. In this process, we developed an a priori coding framework with codes at three levels (i.e., parent, child, and grandchild nodes) based on our initial reading of the data. During this process, we identified additional emergent themes not captured under existing codes, resulting in additional codes.

Our data collection and analysis efforts were iterative. During the subsequent years of the developmental education project, we refined our field note observation forms and focus group protocols based on themes that emerged from the first round of data collection. We used our coding frameworks from the first year of data analysis as a starting point for the codebook in the second year. To do this, we subdivided some of the most frequently used codes into child and grandchild codes and collapsed some infrequently used codes into the parent codes. We also added codes for emergent themes that had not been coded in the first round of coding. For instance, an a priori parent code included “students.” This code was subdivided into “students, general” and “student populations.” However, infrequently used grandchild codes “students, off-campus work” and “students, on-campus work” were collapsed into “student work.”

This process was repeated each year to encompass emergent themes and reflect the overall project’s shifting research question. Our discussions of the process as outlined above provided the foundation for the questions we identify below.

Debating Big Qual

The debate presented here is framed by four related trends to which we attribute the rise of big qual: the rise of big quantitative data, the growing legitimacy of qualitative and mixed methods work in the research community (Creswell & Clark, 2011), technological advances in CAQDAS (Bazeley & Jackson, 2013; J. Davidson, Paulus, & Jackson, 2016), and the willingness of government and private foundations to fund large qualitative projects (Cheek, 2008; Plano Clark, 2010). For each question below, we present the opportunities for innovation as well as the constraints and challenges in research designs for large qualitative data sets that have emerged from our work.

What Opportunities and Constraints Are Presented by Funded Research Involving Big Qual?

Because big qual can be costly to conduct, many, though not all research projects involving large qualitative data sets have been funded by a government agency or private foundation (Cheek, 2008; Plano Clark, 2010). This can be a potential constraint in creating iterative research designs. Miles, Huberman, and Saldaña (2014) have contrasted the linear and sequential nature of quantitative research methods with the iterative and cyclical nature of qualitative inquiry. Indeed, some qualitative traditions such as grounded theory (Charmaz, 2014; Corbin & Strauss, 2008; Denzin & Lincoln, 2017) employ theoretical sampling methods, which study emergent constructs and social phenomena through alternating cycles of data collection and data analysis.

Funding, induction, and iteration

Patton (2015) has commented on the uncertainty inherent in funded qualitative research due to balancing the funders’ need for an intentional, structured, and systematic research plan with the flexibility necessary to explore emergent themes that invariably arise during the research process:

How will they [funders] know what will result from the inquiry if the design is only partially specified? The answer is: They won’t know with any certainty. All they can do is look at the results of similar qualitative inquiries, inspect the reasonableness of the overall strategies in the proposed design, and consider the capacity of the researcher to fruitfully undertake the proposed study. (p. 44)

Not all big qual research projects are longitudinal or take place over multiple years. However, in the context of the DE research, our experience is that large qualitative projects can be equally or more inductive and iterative than small qualitative projects, but the iteration may unfold over several years of a project rather than within a shorter time frame. In addition, teams of researchers bring new concerns and insights that inform the unfolding analytic process. In practical and logistical terms, it can be challenging to have multiple iterations of data collection and analysis within a year. It is costly to return to the field to collect more data multiple times, and it is difficult to coordinate the schedules of teams of researchers and field sites. Between years of a big qual project, research designs can change significantly as new research questions emerge from the data or minor adjustments are made to the existing research design. With the shifting focus of the DE project, the data collection and data analysis plans were improved yearly, and the annual schedule of data collection, data analysis, and writing was adjusted and refined to better reflect the time necessary to complete each phase of the project and the resulting subsidiary research questions.

The qualitative story line

Another constraint in funded big qual projects is related to capacity issues. Because funding enables researchers to collect more data, funded projects can generate such a large quantity of data that data reduction techniques become essential. With this volume of data, it can be challenging to separate the “noise” from the main “story line” in the data, making it difficult to answer the question: What are the data actually saying? In some instances, this required us to employ counting techniques in the DE project and then to provide the rationales and methods of counting identified by other qualitative researchers (Hannah & Lautsch, 2011).

In addition, big qual can present challenges for reporting theories and findings in a cohesive narrative when drawn from so much data. Collectively, we have learned to report the theory and research findings generated from big qual in a variety of ways, including single case studies with sections linking individuals’ lived experiences within broad institution-level or system-level patterns (e.g., Brower, Bertrand Jones, & Hu, 2018; Nix et al., in press; Rutledge et al., 2015), and multiple case studies with vignettes illustrating findings within cases coupled with figures and/or tables summarizing patterns across all cases (e.g., Brower et al., in press; Brower, Mokher, Bertrand Jones, Cox, & Hu, 2019; Cohen-Vogel, Rutledge, & Osborne-Lampkin, 2011; Rutledge et al., 2015), and individual examples nested within composite institutions (e.g., Arnault, 2002; Brower et al., 2017; Conant, 2014). Composite institutions (or individuals) involve presenting data in vignettes that contain data from more than one institution or individual. Composite institutions or individuals can be used as a means of consolidating and summarizing large quantities of data as well as highlighting similar patterns across several examples from the data such as in a multiple case study when several cases share important characteristics.

Mixed methods and Big Qual

A factor that can offer either opportunities or constraints in big qual is whether the project is mixed methods. In some instances, the quantitative and qualitative research designs are truly integrated and, in other instances, the qualitative research design is merely an “add-on” to a large quantitative project. Truly integrated designs may increase between-methods or mixed methods triangulation (Burke Johnson, Onwuegbuzie, & Turner, 2007; Denzin, 1978), while quantitative and qualitative research designs can run parallel with little integration when qualitative research is seen merely as an add-on to a quantitative project.

In addition, while it might be ideal for big qual research teams to be comprised entirely of researchers who subscribe to a pragmatist research paradigm, it may be unrealistic in real-world research settings for all members of a team to subscribe to this perspective. Instead, it may be more likely that, taken as a whole, the mixed methods team will subscribe to “dialectical pluralism,” which Creamer (2018) defines as “a paradigm that reflects what some consider to be the overarching logic of mixed methods: the deliberate engagement with different points of view and ways of achieving knowledge” (p. 245). Nonetheless, the dialectical pluralism perspective with its mixture of positivist, realist, social constructionist, and interpretivist research epistemologies can lead to either significant misunderstandings on mixed methods projects or research designs that are more collaborative, triangulated, and ultimately more rigorous.

What Opportunities and Constraints Are Presented by Sampling Strategies Used in Big Qual Research Designs?

We contrast the sampling strategies in quantitative and qualitative research as the difference between probability sampling that seeks to establish generalizability by generalizing from a sample to a population and purposeful sampling that seeks to establish transferability by selecting information-rich cases for an in-depth understanding of a phenomenon (Lincoln & Guba, 1985; Patton, 2015; Peshkin, 2001). From both a methodological and practical standpoint, important reasons remain for conducting deep single-site analyses within narrowly bounded, small n qualitative studies (Patton, 2015).

Extending opportunities for generalization

Nonetheless, we propose that large qualitative data sets may be gradually moving qualitative data away from purposeful sampling for transferability in the direction of sampling for generalization (Maxwell & Chmiel, 2014; Polit & Beck, 2010). Big qual can employ a variety of qualitative sampling strategies, including single significant case sampling; comparison-focused sampling; group characteristics sampling; theory-focused and concept sampling; instrumental-use multiple case sampling; sequential and emergence-driven sampling; analytically focused sampling; and mixed, stratified, or nested sampling (Patton, 2015). Though not all sampling strategies in big qual are alike, the authors’ research projects often employed maximum variation sampling. Though distinct from quantitative random sampling, maximum variation sampling does share a concern with the variability and representativeness of the sample.

We argue that sampling for big qual affords significant opportunities for innovation. Many qualitative researchers have pointed to the capacity of qualitative research for generating theory (e.g., Charmaz, 2014; Corbin & Strauss, 2008; Denzin & Lincoln, 2017; Patton, 2015). We agree. In our experience, conducting big qual research suggests that large qualitative data sets and their sampling strategies can contribute to theory-building in unique ways. Qualitative research has traditionally had the luxury of looking at small bounded systems (e.g., individuals, small groups, organizations). We often delimit qualitative research questions by stating that phenomena are beyond the scope of our study. However, large n studies may decrease the artificial boundaries we create for practical reasons around bounded systems.

Adding depth and breadth to the analysis

Sampling in large n studies may lend itself to bounding cases more broadly by place and/or time (e.g., enactment of a federal education policy over a range of years in different states or student learning processes from preschool through graduate school). Because big qual allows us to conduct longitudinal qualitative research (E. Davidson & Weller, 2016), we can examine stages and cycles of social phenomena because “patterned periodicities provide us with a short- and long-term understanding of how the rhythms of life and work may proceed” (Miles, Huberman, & Saldaña, 2014, p. 211) rather than focusing more narrowly on phenomena from a cross-sectional perspective.

In the DE project, for instance, the central finding of a conference paper based on a single year of cross-sectional big qual data shifted when it was further developed into a journal article based on multiple years of data. Specifically, the cross-sectional conference paper found that, from the perspective of administrators and staff, the open-access mission of community colleges in Florida was compromised by sweeping state-level legislation focused on efficiency (Nix et al., 2016). However, the eventual journal article based on longitudinal data found that some campus personnel became more accepting of reform efforts over the course of four years when they saw that the impact of Senate Bill 1720 on equality was not as negative as they initially feared (Nix et al., in press).

In addition, large n studies with many research participants at multiple research sites can adopt sampling strategies that help to explore a phenomenon nested within multiple units of analysis (e.g., how policy implementation unfolds at the state, institution, group, and individual levels or educational equity at the individual, classroom, school, district, and state levels). To some degree, qualitative research has always allowed us to study social phenomena at multiple units of analysis (Creamer, 2018; Yin, 2013). However, like binoculars that can focus near or far, large qualitative data sets enhance this function by allowing us to dial down to closely examine phenomena at the micro or individual level and then dial out to view phenomena at the macro or societal level. In this way, researchers can begin to discover large-scale patterns linking each level of analysis to a coherent whole. Thus, with the significant variation that is now possible in large qualitative data sets, coupled with the human pattern-identification capability, it may be possible to examine social phenomena outside narrowly bounded systems to generate more elaborate, big picture theories from data.

For instance, a manuscript about students with stigmatized and minoritized identities linked student experiences to efforts of staff to assist these students in managing their stigma to persist and succeed in community college. This article also linked student and staff interactions at the individual unit of analysis to a broader institutional ethic of care as well as the open-access community college mission at the institutional unit of analysis (Brower et al., 2018).

In addition, big qual that are part of mixed methods projects can improve theory by initially theorizing about large systems or hypothesizing about emerging constructs and associations among constructs, which quantitative research can then verify, extend, or disprove.

An example of theory-building from the DE reform project was a qualitatively derived typology of four broad policy implementation patterns (oppositional, circumventing, satisficing, and facilitative implementation) developed in conjunction with 14 specific behaviors such as improvising and leaving the institution (Brower et al., 2017). An empirically grounded typology, or classification of a social phenomenon according to type, can be derived from qualitative data using methods such as those developed by Kluge (2000). Qualitative typologies can be either “indigenous typologies,” derived from research participants’ in vivo classifications of their own cultural settings, “analyst-constructed typologies,” derived from researchers’ identification of patterns in qualitative data or a mixture of both (Patton, 2015). While the case was specific to an individual higher education policy in a particular state context, the breadth and depth of the data likely increased the researchers’ ability to identify the four broad patterns as well as the potential transferability of the theory to other education policies and policy domains. Thus, the big qual sampling strategy helped to ensure that the typology was both exhaustive and comprehensive, increasing the likelihood that all implementation patterns and behaviors had been identified. In addition, the breadth of the data allowed researchers to distinguish between widespread and infrequent implementation behaviors. A small n qualitative study with data from only three institutions would likely have identified a frequently coded behavior like fundamental rule change or improvisation but might have failed entirely to identify leaving the organization as an implementation behavior because it appeared in the data only 3 times across all institutions.

Contextual details in Big Qual

Despite the strengths of big qual research designs, our experience with large n qualitative studies is that with so many participants in the sample, the researchers who collect the data can be more likely to remember contextual details from the most vivid interview participants and research environments. Therefore, even with thorough field notes, some of the contextual richness of field work can be lost in large qualitative studies. As qualitative researchers, we have not abandoned the language of transferability. However, we suggest it may be time to begin posing the following question: If big qual is less focused on examining unique and information-rich cases, is the sampling logic slowly moving away from transferability in the direction of generalization?

What Opportunities and Constraints Are Presented by Team-Based Data Analysis Processes Facilitated by Technology in Big Qual?

Inter-coder reliability is a way to ensure coding consistency and agreement among a team of researchers (Bazeley & Jackson, 2013; J. Davidson et al., 2016). We argue that one drawback of establishing inter-coder reliability in a large qualitative project is that coding for abstract concepts or sensitizing concepts (Patton, 2015), which can be essential in generating theory, may be de-emphasized. For instance, it may never be possible for a team to reach agreement on the definition of abstract yet essential terms such as “metacognition” or “policy entrepreneur.” We recognize that while new technological features of CAQDAS greatly facilitate coding (Bazeley & Jackson, 2013; J. Davidson et al., 2016), they can also constrain coding by making it normative for teams of researchers to use features such as the Cohen’s κ coefficient function to establish reliability among team members. Burla et al. (2008) and Everitt (1996) have reported that κ ranges of .41–.60 represent moderate intercoder reliability, values greater than .60 indicate satisfactory reliability, and values greater than .80 represent nearly perfect reliability. However, κ coefficients are not the only quality criterion, and Hai-Jew (2017) has pointed out that κ scores tend to decrease with a large number of codes and a large number of researchers. Therefore, we suggest that while Cohen’s κ coefficient ranges above .60 may be ideal for big qual projects, the real value in calculating κ coefficients lies in the discussions that necessarily take place among researchers related to their differing understandings of codes, the definitions of codes, and how these definitions apply to the data.

Nevertheless, precise definitions for codes, which are necessary in establishing reliability and consistency in coding among teams of researchers, may contribute to greater clarity with respect to the concepts present in the data. Thus, while the de-emphasis on abstraction can detract from theory-building processes, the greater precision, clarity, and creativity of group processes may be beneficial to identifying patterns in qualitative data and ultimately to theory-building.

Induction, teams, and technology

Some aspects of team-based data analysis in big qual can make inductive research designs challenging. Inductive methods begin,

with specific observations and builds toward general patterns.…The strategy of inductive designs is to allow the important analysis dimensions to emerge from patterns found in the cases under study without presupposing in advance what the important dimensions will be. (Patton, 1990, p. 56).

It can be difficult to employ an inductive process when the coding framework in large qualitative data sets typically consists of more a priori codes than emergent codes. The emergent nature of identifying patterns in the qualitative data analysis process is described as

more than just a paraphrasing.…It is more than just noting concepts in the margins of the field notes or making a list of codes as in a computer program. Identifying patterns involves interacting with the data using techniques such as asking questions about the data, making comparisons between data, and then developing those concepts in terms of their properties and dimensions. (Corbin & Strauss, 2008, p. 66)

Charmaz (2014) has similarly observed that coding “generates the bones of your analysis.…[I]ntegration will assemble those bones into a working skeleton” (p. 45). Likewise, Bernard (2011) describes analysis as “the search for patterns in data and for ideas that help explain why those patterns are there in the first place” (p. 388).

We argue that certain systematic aspects of data analysis with a research team may move the pattern identification process of qualitative research away from emergent data analysis. Technical difficulties can arise in the software file when a coding framework evolves significantly over time with a team of researchers. This relative lack of flexibility in the coding framework can make it more daunting to identify high-level patterns in the data. Due to these difficulties, an individual researcher working alone may have a greater ability to employ emergent coding by changing the coding structure as the project progresses (Saldaña, 2013).

The underlying “reality” in the data

Perhaps more importantly, the necessity of assigning codes to text with precise definitions tends to assume there is one underlying reality in the data instead of many possible realities. As Stake (1995) observed, “most qualitative researchers not only believe that there are multiple perspectives or views that need to be represented, but that there is no way to establish, beyond contention, the best view” (p. 108).

Another challenge that arises with team-based data analysis is related to interpreting data from the etic (cultural outsider perspective) versus the emic (indigenous or cultural insider perspective) (Denzin & Lincoln, 2017; Pike, 1967) and the co-construction of meaning between participant and researcher. In some instances, a subset of the research team in big qual projects will go into the field to collect data, but a larger team of researchers will analyze that data. However comprehensive the interview transcripts and/or field notes might be, researchers who did not collect data in the field lack the benefit of entering into the participants’ cultural setting and do not have the benefit of recalling contextual details (e.g., body language, facial expressions, details in the research environment). Because the number of researchers who did not go into the field can sometimes outnumber the researchers who collected the data on a big qual project, the co-construction of meaning with the participant can also decrease in the analysis process when the etic perspective is emphasized.

In addition, depending on the roles that researchers have played on a big qual project in terms of the amount of data they have collected and analyzed and their number of years working on a longitudinal project, some researchers on the team may have an advantaged position. Specifically, some of the researchers will have a better “bird’s eye view” of the totality of the data, including the themes that cut across years of the project, institutions, and participant groups and how those themes have evolved over time.

Interpreting data through consensus

Despite these constraints, we suggest that group coding may also result in a more collaborative, inclusive, and creative process than individual coding. Weston et al. (2001), for instance, suggest that “a research team builds codes and coding builds a team through the creation of shared interpretation and understanding of the phenomenon being studied” (p. 382). We have found that arriving at a “best interpretation” of data with a team can initially be a time-consuming process. However, our experience conducting data analysis with multiple researchers eventually results in a consensual interpretation that is more comprehensive and nuanced than the interpretation of a single researcher.

Implications of the Debate

Recent technological advances in CAQDAS (Bazeley & Jackson, 2013; J. Davidson et al., 2016) and the increasing willingness of government and private foundations to fund large qualitative projects (Cheek, 2008; Plano Clark, 2010) make this interdisciplinary discussion increasingly essential across academic disciplines. Already, we have seen new research methods for big qual diffuse to diverse fields such as health sciences and medicine, business, education, environmental science, sociology, social work, anthropology, agriculture, and information science (e.g., Armstrong, Riemenschneider, Nelms, & Reid, 2012; Guest & MacQueen, 2008).

We present this debate about large qualitative data sets as a meta-cognitive process intended to spark a broader discussion about whether big qual methods could someday become a qualitative tradition grounded in philosophical underpinnings. In sparking this discussion, we argue that the benefits of big qual include a more inclusive, collaborative analytic process and increased transferability, breadth, depth, and theory-building potential. The challenges of big qual include a de-emphasis on abstract concepts in the coding framework, a possible decrease in the contextual depth in the data set, and potentially an emphasis on the etic researcher perspective.

These benefits and challenges require us to educate students about the traditional aims of qualitative inquiry and to initiate discussions with other researchers about how these aims may be evolving with the introduction of new methods and technological advances in data analysis software. We contend that in order to do this, we must strive to make our methods and their underlying assumptions as explicit as possible. Moreover, it requires us to think deeply about the “whys” and “hows” as we continue to engage in discussions around big qualitative inquiry.

Future Research Directions

Osborne-Lampkin, Cohen-Vogel, Feng, and Wilson (2018) comment on the use of theory in guiding scientific inquiry:

Theory provides the ideas researchers use about how phenomenon operate in the world. Empirical studies then tests those theories, using findings from them to modify or refine the theory.…Frameworks help researchers set forth predictions about their study outcomes, shape the study design, and once data are collected, are used as a “mirror to check whether the findings agree with the framework or whether there are discrepancies.” (p. 189)

An understanding of the undergirded theory for qualitative methodologies holds promise for not only conceptualizing questions of inquiry but also for developing research designs that will enable us to connect the research to policy and practice. In fact, we assert that it is the understanding of theory behind qualitative methodologies that will provide opportunity for creativity in further exploring big qual processes in understanding how to do this work, how to “use the theories in new ways and in different research contexts” (Osborne-Lampkin, Cohen-Vogel, Feng, & Wilson, 2018, p. 22), and how to better situate our research to inform the field.

Despite the recent focus on understanding the ways in which practitioners and policy makers define, acquire, interpret, and ultimately use research in education, an increased understanding of how practitioners and policy makers make sense of and use research continues to be an area ripe for scientific study (Tseng, 2012). We contend that collaboration between practitioners and researchers in developing frameworks and processes for research designs that use large qualitative data sets can enhance or support the development of innovative methodological designs and approaches as well as support efforts to ensure that the “supply-side” attempts of researchers to address the “demand-side” needs of the “end user” are met (Tseng, 2012). Moreover, illuminating frameworks and designs, and developing tool kits that outline methodological procedures can also increase the rigor of research being conducted and the use by practitioners, policy makers, and researchers, alike.

Our questions about the considerations and opportunities for big qual data emerged through our work as researchers engaged in large-scale qualitative and mixed methods studies. Yet less is known about the enabling structures and supports that facilitate this work. Additional research around the organizational structures such as the methodological teams, frameworks, and procedures for carrying out this work deserve additional examination.

In moving forward, we must ask ourselves: Are there opportunities to not only adjust theory and methodological approaches to better-fit big qual designs but also build upon theoretical knowledge? Additional research around the intersections and divergences of big qual in purely qualitative studies and varying types of mixed methods (e.g., qual-quant, quant-qual) studies is still needed. Also, how do we apply that knowledge to develop research questions, design studies, and analytical approaches to enhance our ability to conceptualize and carry out research that better informs policy and practice, either for multiple purposes or for narrowly tailored research questions? Future studies will also need to determine the academic disciplines most likely to employ big qual methods and to explore the emerging research conventions of each by studying the organizational structures, theoretical frameworks, and most common procedures used by researchers.

Conclusion

Engaging in large-scale qualitative or mixed methods data collection and analysis is nothing new. We propose that many newly introduced and still evolving conventions for large qualitative data sets may be changing the fundamental nature of some qualitative research. Thus, we advocate for a frank discussion of research methods to preserve the traditional nature of constructivist qualitative inquiry while remaining open to opportunities for innovation. We acknowledge that our questions may have varying saliency depending on where researchers situate themselves on the continuum of perspectives on qualitative research and hope that these questions spark further debate regarding the nature of qualitative inquiry.

Footnotes

Acknowledgments

We would like to thank Dr. Lora Cohen-Vogel, University of North Carolina at Chapel Hill, and Dr. Stacey Rutledge, Florida State University, for their work conducted with the National Center on Scaling up Effective Schools. We also thank Dr. Carolyn Herrington, Florida State University, and Dr. Jessica Sidler Folsom, Iowa State University, for their work conducted with Dr. La’Tara Osborne-Lampkin at Florida State University. We have built upon the methodological approaches employed in studies and projects conducted by these researchers with Dr. La’Tara Osborne-Lampkin. Finally, we thank Dr. Ralph S. Brower, Florida State University, for his thoughtful feedback on this article.

Authors’ Note

The opinions expressed are those of the authors and do not represent views of the Institute of Education Sciences, the U.S. Department of Education or the Bill & Melinda Gates Foundation.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported in this article was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A160166 to Florida State University, and in part by a grant from the Bill & Melinda Gates Foundation.

ORCID iD

Rebecca L. Brower

Note

References

Armstrong

D. J.

Riemenschneider

C. K.

Nelms

Reid

M. F.

(2012). Revisiting the barriers facing women in information systems. Journal of Computer Information Systems, 53, 65–74.

Arnault

D. S.

(2002). Help-seeking and social support in Japanese sojourners. Western Journal of Nursing Research, 24, 295–306. doi:10.1177/01939450222045914

Barley

S. R.

(1990). Images of imaging: Notes on doing longitudinal field work. Organization Science, 1, 220–247.

Bazeley

Jackson

(2013). Qualitative data analysis with NVivo. Los Angeles, CA: Sage.

Beebe

(2014). Rapid qualitative inquiry: A field guide to team-based assessment. Boulder, CO: Rowman & Littlefield.

Bernard

H. R.

(2011). Research methods in anthropology: Qualitative and quantitative approaches (5th ed.). Walnut Creek, CA: Altamira Press.

Brooks

(2013, February 18). What data can’t do. New York Times.

Brower

R. L.

Bertrand Jones

(2018). Promoting success among adult community college students with intersectional stigma. Paper presented at the Association for the Study of Higher Education (ASHE) Conference, Tampa, FL.

Brower

R. L.

Bertrand Jones

Park-Gaghan

T. J.

(in press). Student self-determination following developmental education reform in Florida. Teachers College Record.

10.

Brower

R. L.

Bertrand Jones

Tandberg

Park

(2017). Comprehensive developmental education reform in Florida: A policy implementation typology. Journal of Higher Education, 1–26.

11.

Brower

R. L.

Mokher

Bertrand Jones

Cox

B. E.

(2019). From democratic to “need to know”: Distributed leadership and data cultures in Florida’s community colleges. Paper presented at the American Educational Research Association (AERA) Conference, Toronto, Canada.

12.

Brower

R. L.

Woods

Bertrand Jones

Park

Tandberg

… Martindale

S. K.

(2017). Scaffolding mathematics remediation for academically at-risk students following developmental education reform in Florida. Community College Journal of Research and Practice, 1–17.

13.

Burke Johnson

Onwuegbuzie

A. J.

Turner

L. A.

(2007). Toward a definition of mixed methods research. Journal of Mixed Methods Research, 1, 112–133. doi:10.1177/1558689806298224

14.

Burla

Knierim

Barth

Liewald

Duetz

Abel

(2008). From text to codings: Intercoder reliability assessment in qualitative content analysis. Nursing Research, 57, 113–117. doi:10.1097/01.NNR.0000313482.33917.7d

15.

Calma

(2013). Preparing tutors to hit the ground running: Lessons from new tutors’ experiences. Issues in Educational Research, 23, 331–345.

16.

Charatsari

Papadaki-Klavdianou

(2017). First be a woman? Rural development, social change and women farmers’ lives in Thessaly-Greece. Journal of Gender Studies, 26, 164–183.

17.

Charmaz

(2014). Constructing grounded theory: A practical guide through qualitative analysis (3rd ed.). Thousand Oaks, CA: Sage.

18.

Cheek

. (2008). The practice and politics of funded qualitative research. In Strategies of qualitative inquiry (pp. 45–74). Retrieved from https://books.google.com/books?hl=en&lr=&id=hD7W095U66cC&oi=fnd&pg=PA45&dq=the+practice+and+politics+of+funded+qualitative+research&ots=7W4ji3ymQZ&sig=Lp16DCT4SYYCyqNgkKKV7w1ZHIA

19.

Clow

K. E.

James

K. E.

(2010). Essentials of marketing research: Putting research into practice. Thousand Oaks, CA: Sage.

20.

Cohen-Vogel

Rutledge

Osborne-Lampkin

(2011). The practices, programs and policies of higher and lower value-added schools: Year One findings from research in Broward County Public Schools. Nashville, TN: National Center for Scaling up Effective Schools.

21.

Coleman

von Hellermann

(2011). Multi-sited ethnography: Problems and possibilities in the translocation of research methods. New York, NY: Routledge.

22.

Collins

F. S.

Varmus

(2015). A new initiative on precision medicine. New England Journal of Medicine, 372:793–795.

23.

Conant

R. D.

(2014). Memories of the death and life of a spouse: The role of images and sense of presence in grief. In Klass

Silverman

P. R.

Nickman

S. L.

(Eds.), Continuing bonds: New understandings of grief (pp. 179–232). New York, NY: Routledge.

24.

Corbin

Strauss

(2008). Basics of qualitative research: Techniques and procedures for developing grounded theory (3rd ed.). Los Angeles, CA: Sage.

25.

Creamer

E. G.

(2018). An introduction to fully integrated mixed methods research. Thousand Oaks, CA: Sage.

26.

Creswell

J. W.

Clark

V. L. P.

(2011). Designing and conducting mixed methods research. Thousand Oaks, CA: Sage.

27.

Davidson

Weller

(2016). Thinking collaboratively over time: Collaboration in, and across, qualitative longitudinal research practice. International Journal of Qualitative Methods, 15. Retrieved from https://login.proxy.lib.fsu.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=edswss&AN=000389921700519&site=eds-live

28.

Davidson

Paulus

Jackson

(2016). Speculating on the future of digital tools for qualitative research. Qualitative Inquiry, 22, 606–610.

29.

Davidson

MacGregor

M. W.

Stuhr

Dixon

MacLean

(2000). Constructive anger verbal behavior predicts blood pressure in a population-based sample. Health Psychology, 19, 55–64.

30.

Denzin

N. K.

(1978). The research act: A theoretical introduction to sociological methods (2nd ed.). New York, NY: McGraw-Hill.

31.

Denzin

N. K.

Lincoln

Y. S.

(Eds.). (2017). The SAGE handbook of qualitative research (5th ed.). Los Angeles, CA: Sage.

32.

Diebold

F. X.

(2003). Big data dynamic factor models for macroeconomic measurement and forecasting: A discussion of the papers by Reichlin and Watson. In Dewatripont

Hansen

L. P.

Turnovsky

(Eds.), Advances in economics and econometrics: Theory and applications (pp. 115–122). Cambridge, England: Eighth World Congress of the Econometric Society, Cambridge University Press.

33.

Diebold

F. X.

(2012). A personal perspective on the origin(s) and development of “Big Data”: The phenomenon, the term, and the discipline (Scholarly Paper No. ID 2202843). Social Science Research Network. Retrieved from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2202843

34.

Elsesser

K. M.

Lever

(2011). Does gender bias against female leaders persist? Quantitative and qualitative data from a large-scale survey. Human Relations, 64, 1555–1578.

35.

Eta

E. A.

Kallo

Rinne

(2018). Process of transfer and reception of bologna process ideas in the Cameroon higher education system. European Educational Research Journal, 17, 656–675.

36.

Everitt

(1996). Making sense of statistics in psychology: A second-level course. Oxford, England: Oxford University Press.

37.

Greene

J. A.

Choudhry

N. K.

Kilabuk

Shrank

W. H.

(2011). Online social networking by patients with diabetes: A qualitative evaluation of communication with Facebook. Journal of General Internal Medicine, 26, 287–292.

38.

Guest

MacQueen

K. M.

(Eds.). (2008). Handbook for team-based qualitative research. Lanham, England: Rowman Altamira.

39.

Hai-Jew

(2017, April 26). Data query: Coding comparison (advanced) and Cohen’s kappa coefficient. Retrieved from http://scalar.usc.edu/works/using-nvivo-an-unofficial-and-unauthorized-primer/coding-comparison-advanced

40.

Hannah

D. R.

Lautsch

B. A.

(2011). Counting in qualitative research: Why to conduct it, when to avoid it, and when to closet it. Journal of Management Inquiry, 20, 14–22.

41.

Hurst

Arulogun

Owolabi

M. O.

Akinyemi

R. O.

Uvere

Warth

Ovbiagele

(2016). The use of qualitative methods in developing implementation strategies in prevention research for stroke survivors in Nigeria. The Journal of Clinical Hypertension, 18, 1015–1021.

42.

Jenkins

E. K.

Slemon

Haines-Saah

R. J.

Oliffe

(2018). A guide to multisite qualitative analysis. Qualitative Health Research, 27, 1969–1977.

43.

Kerrison

E. M.

Cobbina

Bender

(2018). Stop-gaps, lip service, and the perceived futility of body-worn police officer cameras in Baltimore City. Journal of Ethnic and Cultural Diversity in Social Work, 1–18.

44.

Kluge

(2000). Empirically grounded construction of types and typologies in qualitative social research. Forum: Qualitative Social Research [Forum Qualitative Sozialforschung], 1, Art. 14, Retrieved from http://nbn-resolving.de/urn:nbn:de:0114-fqs0001145

45.

Knight

D. W.

Cottrell

S. P.

Pickering

Bohren

Bright

(2017). Tourism-based development in Cusco, Peru: comparing national discourses with local realities. Journal of Sustainable Tourism, 25, 344–361.

46.

LaPointe-McEwan

DeLuca

Klinger

D. A.

(2017). Supporting evidence use in networked professional learning: The role of the middle leader. Educational Research, 59, 136–153.

47.

Levy

Brink

(2005). A change of heart: How the Framingham heart study helped unravel the mysteries of cardiovascular disease. New York, NY: Knopf.

48.

Lincoln

Y. S.

Guba

E. G.

(1985). Naturalistic inquiry (Vol. 75). Thousand Oaks, CA: Sage.

49.

Lynn

. (2017). Rising recreancy: Flood control and community relocation in Houston, TX, from an environmental justice perspective. Local Environment, 22, 321–334.

50.

Mahmood

S. S.

Levy

Vasan

R. S.

Wang

T. J.

(2014). The Framingham heart study and the epidemiology of cardiovascular disease: A historical perspective. The Lancet, 383, 999–1008.

51.

Manning

R. M.

Greenwood

R. M.

(2018). Microsystems of recovery in homeless services: The influence of service provider values on service users’ recovery experiences. American Journal of Community Psychology, 61, 88–103.

52.

Maxwell

J. A.

Chmiel

(2014). Generalization in and from qualitative analysis. In The SAGE handbook of qualitative data analysis (pp. 540–553). Los Angeles, CA: Sage.

53.

Mayberry

L. S.

(2016). The hidden work of exiting homelessness: Challenges of housing service use and strategies of service recipients. Journal of Community Psychology, 44, 293–310.

54.

Miles

M. B.

Huberman

A. M.

Saldaña

(2014). Qualitative data analysis: A methods sourcebook. Los Angeles, CA: Sage.

55.

Mokher

C. G.

Spencer

Park

T. J.

(2019). Exploring institutional change in the context of a statewide developmental education reform in Florida. Community College Journal of Research and Practice, 1–14. doi:10.1080/10668926.2019.1610672

56.

Nix

A. N.

Bertrand Jones

Brower

. (in press). Equality, efficiency, and developmental education reform: The impact of SB 1720 on the mission of the Florida college system. Community College Review.

57.

Nix

A. N.

Bertrand Jones

Brower

Tandberg

D. A.

Park

… Martindale

S. K.

(2016). Shifting priorities: The impact of SB 1720 on the accessibility of higher education. Paper presented at the American Educational Research Association Annual Meeting, Washington, DC.

58.

Ohm

(2012). Don’t build a database of ruin. Harvard Business Review. Retrieved from http://blogs.hbr.org/cs/2012/08/dont_build_a_database_of_ruin.html

59.

Osborne-Lampkin

Cohen-Vogel

Feng

Wilson

(2018). Researching collective bargaining agreements: Building conceptual understandings in an era of declining union power. Educational Policy, 32, 152–188. doi:10.1177/0895904817745378

60.

Park-Gaghan

Woods

Bertrand Jones

Cig

Tandberg

. (in press). Gateway course accessibility and the racial/ethnic achievement gap: The case of student success following Florida’s developmental education reform. Teachers College Record.

61.

Park

Woods

Bertrand Jones

Tandberg

(2018). What happens to underprepared students when developmental education is optional? The case of math. Journal of Higher Education, 89, 318–340.

62.

Patton

M. Q.

(1990). Qualitative evaluation and research methods (2nd ed.). Thousand Oaks, CA: Sage.

63.

Patton

M. Q.

(2015). Qualitative evaluation and research methods (4th ed.). Thousand Oaks, CA: Sage.

64.

Peshkin

(2001). Angles of vision: Enhancing perception in qualitative research. Qualitative Inquiry, 7, 238–253.

65.

Pike

K. L.

(1967). Language in relation to a unified theory of the structure of human behavior. The Hague, the Netherlands: Mouton.

66.

Plano Clark

V. L.

(2010). The adoption and practice of mixed methods: US trends in federally funded health-related research. Qualitative Inquiry, 16, 428–440.

67.

Polit

D. F.

Beck

C. T.

(2010). Generalization in quantitative and qualitative research: Myths and strategies. International Journal of Nursing Studies, 47, 1451–1458.

68.

Reed

S. J.

Strzyzykowski

Chiaramonte

Miller

R. L

. (2018). “If it weren’t for them, I’d probably be lost”: The diversity and function of natural mentors among young Black men who have sex with men. Youth and Society, 1–20.

69.

Rutledge

Cohen-Vogel

Osborne-Lampkin

(2012). Identifying the characteristics of effective schools: Report from year one of the national center for scaling up effective schools.Nashville, TN: National Center for Scaling up Effective Schools.

70.

Rutledge

Cohen-Vogel

Osborne-Lampkin

Roberts

(2015). Understanding effective high schools: Evidence for personalization for academic and social learning. American Educational Research Journal, 52, 1060–1092.

71.

Saldaña

(2013). The coding manual for qualitative researchers. London, England: Sage.

72.

Shah

Horne

Capellá

(2012). Good data won’t guarantee good decisions. Harvard Business Review, 90, 23–25.

73.

Stake

R. E.

(1995). The art of case study research. Thousand Oaks, CA: SAGE.

74.

St-Hilaire

Gilbert

M. H.

Lefebvre

. (2018). Managerial practices to reduce psychosocial risk exposure: A competency-based approach. Canadian Journal of Administrative Sciences [Revue Canadienne des Sciences de l’Administration], 35, 535–550.

75.

Tseng

(2012). The uses of research in policy and practice (Social Policy Report, Vol. 26, No. 2) . Ann Arbor, MI: Society for Research on Child Development.

76.

Weston

Gandell

Beauchamp

McAlpine

Wiseman

Beauchamp

(2001). Analyzing interview data: The development and evolution of a coding system. Qualitative Sociology, 24, 381–400.

77.

Woods

C. S.

Park

Bertrand Jones

(2018). How high school coursework predicts college gateway course success. Community College Review, 46, 176–196.

78.

Woods

Park

T. J.

Bertrand Jones

(2019). Reading, writing, and English course pathways when developmental education is optional: Course enrollment and success for underprepared first time-in-college students. Community College Journal of Research and Practice, 43, 5–25.

79.

Yin

R. K.

(2013). Case study research: Design and methods. Thousand Oaks, CA: Sage.