Abstract
This article outlines a five-phase process of qualitative analysis that draws on deductive (codes developed a priori) and inductive (codes developed in the course of the analysis) coding strategies, as well as guided memoing and analytic questioning, to support trustworthy qualitative studies. The five-phase process presented here can be used as a whole or in part to support researchers in planning, articulating, and executing systematic and transparent qualitative data analysis; developing an audit trail to ensure study dependability and trustworthiness; and/or fleshing out aspects of analysis processes associated with specific methodologies.
Keywords
Introduction
Rigorous, trustworthy qualitative analysis is systematic, organized, and iterative in nature (Ravitch and Carl, 2019). Given the amount of detail and data that qualitative studies can produce, it can also be time-consuming and overwhelming. As such, qualitative researchers should choose analysis strategies that allow for a balance among data organization, study purpose, theoretical and conceptual concerns, and the inductive nature of qualitative work. As Miles and Huberman (1994) note, The challenge is to be explicitly mindful of the purposes of your study and of the conceptual lenses you are training on it – while allowing yourself to be open and reeducated by things you didn’t know about or expect to find. (p. 56)
Qualitative researchers are bound by the requirements of their chosen methodologies as well as by the need to conduct research that is trustworthy. Generally, to ensure study trustworthiness, research should meet the following criteria: credibility, or confidence in the findings; dependability, or the consistency of the findings; confirmability, or the extent to which the findings are free from researchers’ biases; and transferability, or how well the research may be applicable to similar contexts (Tierney & Clemens, 2011). Critical to conducting a rigorous, trustworthy qualitative study is developing and articulating a clear, detailed process of data analysis. Unclear, minimally planned, or poorly articulated data management and data analysis processes can lead to trustworthiness issues in a qualitative study (Nowell et al., 2017). As Nowell et al. (2017) note, a trustworthy qualitative study should “demonstrate that data analysis has been conducted in a precise, consistent, and exhaustive manner through recording, systematizing, and disclosing the methods of analysis with enough detail to enable the reader to determine whether the process is credible” (p. 1). Transparency in data analysis methods and in how the conclusions came about – process and product – can also support dependability, confirmability, and transferability of the findings.
Triangulating data with, and situating the findings in, the existing literature and theory can also support trustworthiness in qualitative studies – specifically, by increasing the credibility, confirmability, and transferability of the findings (Tierney & Clemens, 2011). To support credibility and transferability, generally, researchers must connect the findings of the analysis to the existing literature and apply any theoretical or conceptual frameworks in a way that expands the impact of the analysis and the findings (Bingham et al., In press). Systematic, detailed analysis processes that engage both deductive and inductive analytic strategies can support analytic transparency and can help researchers apply concepts from the literature and theory, which can in turn support the trustworthiness and applicability of the study (Bingham and Witkowsky, 2022; Bingham et al., In press).
This article expands on ideas presented by Bingham and Witkowsky (2022) to explicate a process of qualitative analysis, rooted in both deductive and inductive strategies. Specifically, in this article, I outline a five-phase process of qualitative analysis that draws on deductive (codes developed a priori) and inductive (codes developed in the course of the analysis) coding strategies, as well as guided memoing, to help manage data and to support trustworthy, replicable, and actionable qualitative studies of policies, interventions, and reforms. The data analysis process I present here can help researchers to engage and articulate clear analysis processes, situate methodologically specific strategies in the analysis process, apply theoretical and conceptual frameworks as appropriate, and provide clarity around what it looks like to move from code to theme to finding to explanation of finding in a particular study. This five-phase analysis process, or aspects of this process, can be applied as part of the analytic strategy across several qualitative methodological traditions and can also be engaged as a method of qualitative data analysis in mixed methods research, and in thematic analysis, discourse analysis, policy analysis, and content analysis.
Deductive and Inductive Practices in Qualitative Analysis
Deductive or a priori coding involves creating codes prior to data analysis and applying those codes to the data (Bingham and Witkowsky, 2022; Crabtree and Miller, 1999). Deductive analysis can be used to organize data or sort data into predetermined categories created from literature or theory. Deductive codes can be developed as purely organizational categories (e.g., the type of data or when it was collected), categories based on the research purpose or questions (e.g., the main topics of the research or key aspects of the research questions), or as categories generated from the literature and/or from theory (e.g., named concepts from the theoretical framework) (Bingham and Witkowsky, 2022).
Deductive codes can also be developed from propositions. Propositions are similar to the concept of hypotheses in quantitative work; they take the form of statements created by the researcher to describe what they think will emerge from the data. They are deductive in nature because they are often developed from existing theory and/or research. However, they can also be considered inductive, as propositions can sometimes be formed in the course of the analysis and tested during further analysis. Propositions can help focus the inquiry, shape protocols, identify data relevant to the research questions, and guide data collection and analysis (Yin, 2014). During analysis, propositions can be used as a guide to identify pertinent data points, to bound the inquiry, and to offer points of comparison for what is happening in the study and what the empirical and theoretical literature describes.
Inductive analysis is a key characteristic and strength of qualitative research. Inductive analysis involves reading through the data and identifying codes, categories, patterns, and themes as they emerge (Saldaña and Omasta, 2017; Miles et al., 2020). In other words, codes and categories are not predetermined, but are instead identified and named as the researcher reads through the data. Common practices in inductive analysis include open or initial coding, and the constant comparative method of analysis. Open coding refers to a process whereby researchers identify and name essential concepts and patterns in the data (Glaser and Strauss, 1967). This kind of coding is also referred to as initial coding (Charmaz, 2014). The constant comparative method is an inductive strategy in which researchers code the data, compare data with data and codes with codes, and eventually condense codes into categories, categories into themes, and themes into findings (Charmaz, 2014).
In inductive analysis, the researcher engages with and defines the data, providing the link between initial or open coding and eventual emergent themes and theories (Charmaz, 2014). On the whole, inductive analysis is used to make meaning from the data, develop codes, categories, themes, and findings; identify representative data to support findings; and explain findings using theory and literature (Bingham and Witkowsky, 2022).
Memoing – a critical component of rigorous qualitative data collection and analysis – can be engaged throughout the course of a qualitative study in conjunction with both deductive and inductive strategies. Memoing is a process of recording thoughts and notes throughout the course of a study (Strauss, 1987) and it can support the transparent analytic work needed to conduct a trustworthy study. Memoing can take a variety of forms, including a record of data collection and analysis processes, analytic decisions, reflections on field work, ruminations on the meaning of the data, and potential answers to research questions (Bingham and Witkowsky, 2022; Ravitch and Carl, 2019; Strauss, 1987). Recording data collection and analysis processes and analytic decisions aids in trustworthiness (Lincoln & Guba, 1985) by allowing the researcher to keep a running log of thoughts and lines of analysis, and to maintain a detailed, evolving description of the data, a continuing analytic record of emerging codes and themes, and possible interpretations and conclusions as they develop (Creswell and Poth, 2016; Stake, 1995). As such, the five-phase analysis process outlined in this paper includes memoing as a critical component of each coding stage.
Qualitative Coding Phases
Coding phases or stages are one way for qualitative researchers to balance the demands of qualitative analysis and support the trustworthiness of the study. Saldaña (2013) divides coding into two primary stages: first cycle coding and second cycle coding. First cycle codes are those initially applied to the data, and second cycle codes can be applied to the data within the generated first cycle codes. In other words, in the second cycle, the researcher can further analyze the coded text of the first cycle, adding a second layer of coding to the initial first cycle codes. The forms of coding used within each cycle can vary. For example, coding can be divided into inductive and deductive cycles (Bingham and Witkowsky, 2022).
In the qualitative data analysis process outlined in this article, I break the coding process into five distinct phases. These phases entail using attribute coding to organize the data; topic coding to bound the inquiry; open/initial coding to identify patterns and themes in the data; memoing to develop findings from those themes; and examining the findings using codes developed from the existing literature and the theoretical framework(s), in order to apply theory, explain the findings, and situate the findings in the literature. Memoing occurs in each phase. This five-phase approach is meant to support researchers in moving from data collection to the final report – whether it be the dissertation, the first article, or the hundredth. Although the analysis process outlined in this paper is perhaps better suited to linear analytic methods rather than methods that require a more recursive approach, the stages of analysis I present here can be used as cycles as well, and researchers can spiral through these stages as needed. Additionally, this analytic process is flexible – researchers can draw on one or all of the phases as appropriate for their study and methodology.
In presenting this five-phase analysis approach, I aim to offer a systematic process that can support transparency in data analysis toward conducting a trustworthy qualitative study. I also provide some guidance around how to use analytic questions to create useful memos and support the development of findings from coding and analysis. I discuss how to move from codes to themes to findings statements in order to identify clearer findings, make a more substantial contribution to the existing knowledge base, include clear and applicable recommendations, and apply theoretical framing, which many researchers struggle to do in a way that is transparent and useful for both the researcher and the reader (Bingham et al., In press). Toward these ends, in the next section, I detail each analysis phase, discussing the purpose and procedures associated with each phase, providing examples from an in-progress study of an innovative educational reform, and exploring different types of memoing that can occur during each phase.
A Five-Phase Qualitative Analysis Process
In this section, I describe each step in this five-phase analysis process and use excerpts from recently collected field notes to demonstrate how each phase can be applied. The example data comes from an in-progress study of data-driven instruction in a competency-based learning school. Because the example study is in progress, the coding and memos I present are the rough first results of this five-phase process. Although there are five phases, there is more than one analytic activity associated with each phase. Phases are grouped by related activity. See Figure 1 for a visual representation of the five phases of analysis, including the name, function, memoing strategies, and outcomes associated with each phase. The five-phase process. Adapted from Bingham and Witkowsky (2022).
Phase 1: Organizing the Data
Qualitative studies vary in the amount of data collected and analyzed, but even smaller studies can produce large amounts of data. This is especially true for studies that span multiple participants and/or research sites, or which involve many different forms of data (MacQueen and Milstein, 1999; Williamson and Long, 2005). Larger qualitative data sets, in particular, can make it difficult for researchers to maintain the trustworthiness of the study (White et al., 2012). Effectively managing and organizing qualitative data, particularly in larger studies, supports the rigor and trustworthiness of the study (White et al., 2012). As such, it is important to have sound organizational practices in place. The first phase of analysis thus relies on deductive strategies to support data organization and management. In the first phase of coding, a form of organizational coding called ‘attribute coding’ (Miles et al., 2020) is used to sort the data into categories, in order to develop an organizational schema. The coding and labeling suggested here can also be accomplished by using detailed file labels when importing data into qualitative data analysis software programs, such as NVivo, Atlas.ti, Quirkos, Dedoose, etc. The organizational schema created in this phase (whether developed by hand or within some kind of qualitative data analysis software) supports the researcher in identifying where particular excerpts or evidence came from, and offers the opportunity to sort data by type, source, location, or time period.
The first step in this phase is to develop attribute codes that identify data type, source, time, and location. The researcher then tags the data associated with each of these attribute codes to categorize and manage the data. For example, codes for a piece of data might include ‘Interview’ (data type), ‘Dr. Hernandez’ (participant), ‘School for the Arts’ (location), ‘Fall 2017’ (time period). See Figure 2 for an example of this method of coding. Phase 1: Attribute coding.
In this example, which is an excerpt from field notes taken during a site visit and observation of a competency-based learning school, I used Microsoft Word to tag the data using a priori codes representing organizational categories. This process of organizational attribute coding makes it easier for the researcher to keep track of the data and to identify sources of evidence. This phase of analysis also offers the opportunity for the researcher to (re)familiarize themselves with the data they have collected.
Memoing during phase one is mostly focused on procedures and processes, though of course, the researcher can also record any thoughts generated in looking at the data. This might include initial impressions and thoughts on data collection and the analysis process. A process of memoing during attribute coding can help with keeping track of analytic procedures and building trustworthiness. This round of memoing, for example, supports the initial development of an audit trail – a detailed description of the steps taken in the research – which can support the confirmability of the study’s findings (Lincoln & Guba, 1985; Tierney & Clemens, 2011). See Figure 3 for a sample memo, created during the first analysis phase. Phase 1 memo.
Phase 2: Sorting Data into Relevant Topical Categories
The second phase also relies on deductive analysis processes. In this second phase of coding, the researcher develops a priori topic codes aligned to the study’s purpose, research questions, and/or propositions and then reads through the data more carefully, in order to sort and organize the data. The topic codes – sometimes referred to as descriptive codes (Saldaña, 2013; Wolcott, 1994) – should reflect broad categories of interest that represent the researcher’s purpose and the research questions and/or propositions aligned with that purpose. In the example study of data-driven instruction in a competency-based learning school, for example, the research team created the following topic codes aligned to categories of interest: ‘Data analysis,’ and ‘Data-driven classroom practice.’ These kinds of codes help to maintain focus on the research questions and make it easier to identify pertinent data and facilitate subsequent cycles of inductive analysis.
Similarly, if the researcher has developed propositions, they could follow a comparable process by creating categories that reflect the substance of the proposition. For example, continuing with the study on data-driven instruction, the research team developed a proposition that states ‘Teacher data analysis supports personalized learning objectives for students.’ This proposition was developed from the existing literature and from the findings of previously conducted studies. During this coding phase, codes were then developed from this proposition in order to sort the data into relevant categories. This included codes like ‘Teacher data analysis’ and ‘Personalized objectives.’ This process sorts the data into categories relevant to the propositions, which supports testing the propositions in later analysis phases. See Figure 4 for an example of coding phase 2. Topic codes.
Phase 2 can help the researcher filter out data that is not relevant to the research questions and can also help with processes like identifying cross-coded data and examining data for conceptual intersections. This phase also supports researchers in forming some initial impressions of the data, in order to engage in more tailored, granular inductive analysis in later phases. If events, quotes, or passages are striking in some way, the researcher can engage in ‘pre-coding,’ in which significant quotes or passages are bolded, highlighted, underlined, or otherwise noted (Saldaña, 2013).
In this phase, potential memoing strategies include recording decisions about code development and keeping track of any ideas to follow up on in subsequent rounds of analysis, including anything identified in the pre-coding process. If ideas about potential patterns or themes begin to develop, for example, the researcher can develop a memo to record these. Then, in the third phase of coding, the researcher can determine whether those ideas fully emerge during analysis.
Researchers can also begin to develop a codebook, or coding guide, to keep track of code development and to aid in developing an audit trail, which can support the trustworthiness and rigor of the research. This coding guide is an ever-evolving record of the codes the researcher developed/applied to the data and their definitions, as well as what should be included and excluded from each code. The codebook is especially useful when there is more than one researcher on the project, as it can help with interrater reliability and can ensure that codes are being applied in a consistent manner. That being said, sometimes a single researcher returning to the data at different timepoints can see things differently as well. Keeping a coding guide ensures that the researcher maintains consistency in code application throughout the study and keeps a record of any changes. See Figure 5 for an example of a coding guide entry. Coding guide Entries.
Phase 3: Open/Initial Coding
Because the data have been sorted into categories aligned to the research questions in phase 2, in phase 3, the researcher can focus on those research questions by inductively analyzing the data within and across each topical category. Thus, the third phase begins with a process of open coding, in which the researcher reviews the data within the topic codes applied in the first and second phases of coding. The researcher reads through the data in each category, creating, defining, and applying codes as they read. Codes may be created and applied across the initial topic categories. This process supports the researcher in identifying emerging topics or concepts that may later be developed into themes and findings.
This phase is where researchers draw on the process of constant comparison. Throughout this phase, the researcher engages in the constant comparative method (Glaser and Strauss, 1967). They analyze the data and develop codes to describe that data, all the while comparing the newly analyzed data to the previously analyzed data and determining either that the data can be described by existing codes, or that new codes need to be created. This process of open coding and constant comparison continues throughout the third phase, until the researcher has reviewed all of the data. See Figure 6 for an example of codes applied in coding phase 3. Open coding example.
The codes shown in the example above were developed inductively – created and applied to the data in the course of the analysis, not a priori. As the codes were developed, they were entered into the coding guide. This round of coding requires the researcher to continue to record and define codes and refine their coding guide by outlining and defining the codes generated in this phase. This process provides a record of codes – how codes definitions evolve and how codes are condensed.
In this phase, it’s useful to create a memo that includes each of the research questions or propositions. This memo can serve as a running log of possible answers to the research questions, potential evidence from field notes and interviews, and supporting quotes. The memo can be digital, like a Microsoft Word document, or a document created within Qualitative Data Analysis Software, such as NVivo or Atlas.ti, or it can be fully analog, such as notecards, or post-it notes. Whichever way the researcher prefers to do it, the document should include each research question and/or proposition, followed by possible answers, ideas, and supporting evidence. Although, coding is the primary form of analysis in this phase, the memos are critical to how well the researcher is able to develop and support their findings in subsequent rounds of analysis. See Figure 7 for an example memo from coding phase 3. Research question memo.
Phase 4: Identifying Patterns, Themes, and Findings
The ultimate goal of the fourth coding phase is to develop the findings of the analysis. This requires some inductive analytic work, which includes pattern coding to identify themes, and coalescing themes into findings statements. In this phase, the researcher first reviews the data that was coded in the third phase, looking for patterns that emerge within and across data sources in order to develop themes. This process is sometimes referred to as pattern coding, a method of collapsing the codes created during open/initial coding to separate the data into fewer analytic concepts (Miles et al., 2020; Saldaña, 2013). In the process of pattern coding, some of the inductive codes developed during open coding in phase 3 as well as any pattern codes applied or created throughout this phase may be subsumed, revised, or eliminated altogether (Saldaña, 2013). In this process of identifying patterns and collapsing codes, the researcher can begin to identify themes. Themes are developed by synthesizing the remaining inductive codes into theme statements that describe the patterns seen in the data.
How the researcher presents their findings may differ depending on their methodology, or just their preference. Some researchers stop at the theme stage, presenting the themes as the findings of their analysis. Themes may be presented as one word, a short phrase, or full sentences. Other researchers prefer to develop a word or a short phrase to represent themes, after which the themes can be condensed into findings, or the themes can be elaborated on to turn them into findings. One way to ensure that findings are clear and actionable is to present findings as more nuanced, more specific phrases or sentences. For example, drawing on the example data, codes, and memos presented thus far, one of the emergent themes is ‘Real Time Data Analysis,’ a code developed in the third phase. Another theme is ‘Data Transparency.’ A findings statement developed from these themes might be something like ‘Data transparency supports real time data analysis.’ See Figure 8 for a visual representation of this coding phase. Pattern coding to develop themes and findings.
Memoing during the fourth phase of coding is similar to memoing in phase 3 and focuses on recording possible responses to research questions, and further identifying and organizing evidentiary support by research question. The researcher should also continue to record any analytic decisions, including how codes are collapsed and why. Continuing to be diligent about memoing during this phase can help immeasurably in writing up the final report. See Figure 9 for an example. Research question and data analysis memo.
Phase 5: Applying Theory and Explaining Findings
The fifth coding phase combines deductive and inductive analytic practices to support what many researchers find to be the most difficult part of the article or dissertation – the discussion of findings (Bitchener and Basturkmen, 2006; Chien and Li, 2022). In this phase, the researcher creates and applies codes aligned to the existing literature and to the theoretical framework. Literature codes can be developed from themes or constructs that the researcher identified in the existing literature during their literature review. As an example, in the topic of data-driven instruction, some scholars have argued that an emphasis on the data generated from summative testing in the classroom leads to a focus on testing and not critical thinking (Neuman, 2016). A literature code developed from this theme in the literature might then be something like ‘Teaching to the Test.’ During this fifth phase, the researcher can sort the qualitative data into categories aligned to the literature, providing a foundation for understanding how the researcher’s findings relate to the existing research.
Similarly, theory codes are developed from the tenets and key components of the researcher’,,s theoretical framework. For example, in the study from which the examples in this paper came, sensemaking theory is the theoretical framework (Weick, 1995), and the concepts of assimilation and accommodation are used to frame the findings (Piaget, 1972, 1977). Assimilation is the process whereby individuals make sense of a reform (or an ambiguity, a discordance, etc.) by either disregarding the aspects of the reform that do not fit within the individual’s current knowledge frame or by reinterpreting those aspects to fit within their existing knowledge frame. Accommodation, on the other hand, is the sensemaking process in which the individual changes their existing knowledge frame or develops entirely new knowledge frames in response to a reform. Codes developed from sensemaking theory could thus include ‘Assimilation’ and ‘Accommodation.’ By sorting the qualitative data into theoretical categories during this phase, the researcher can get a sense of how the data and the findings of the analysis fit within the concepts of the theoretical framework, which provides the foundation for a theoretically-informed analysis and a robust discussion of the findings. If there is data that does not fit within concepts of the theoretical framing, then there may be additional theoretical contributions to be made.
In the course of this phase, the researcher also draws on inductive strategies by developing short phrases connecting the findings to the theoretical framing and to the existing research. For example, continuing with the data-driven instruction example, if the teachers were not changing their existing knowledge frames as they made sense of and enacted data-driven instruction, the researcher can build the discussion on the following statement: ‘Teachers made sense of the demands of real time data analysis through a process of assimilation, which is consistent with other research.’ This process of interpreting the data through empirical and theoretical lenses can help researchers to make sense of and explain findings and explore whether and how findings converge and diverge from the literature and the theory. This phase can support researchers in identifying precisely how their study contributes to the literature and in building a more nuanced argument. See Figure 10 for an example of theory- and literature-based coding, as well as how theory can be applied to make sense of findings. Literature and theory coding.
In memoing during this phase, the process of analytic questioning can help researchers flesh out their findings (Neumann, 2006; Bingham, 2017; Bingham et al., 2018). Analytic questioning involves memoing in response to questions about the data based in existing research and theory. Throughout this process, the researcher memos in response to questions designed to stimulate thinking around what the findings really meant, why they mattered, and what they contributed. Analytic questions in this round of memoing could include: What do the participant interviews say about [the topic]? What do these findings mean in context? How are they related to the larger existing literature base? Why did things happen as they did? What can be learned from these findings? This final phase of analysis helps the researcher to examine what the findings mean in the larger sense, in order to present them in context to the reader. See Figure 11 for an example of a memo using analytic questioning to develop ideas for the discussion section. Analytic questioning memo.
Conclusion
The five-phase process presented here is a technique that can be used as a whole or in part to support researchers in planning, articulating, and executing systematic and transparent qualitative data analysis; developing an audit trail to ensure study dependability and trustworthiness; and/or fleshing out aspects of analysis processes associated with specific methodologies. Although this five-phase data analysis process is a more generic approach to qualitative analysis, researchers may use aspects of this approach to support analysis processes that are specific to particular methodologies, including narrative inquiry, grounded theory, and phenomenology. For example, the process of restorying in a narrative study – which involves gathering participants’ stories through interviews and conversations; analyzing them for narrative elements like characters, time, place, setting, actions, problems, and resolutions; and retelling the story chronologically (Ollerenshaw & Creswell, 2002) – may be folded into the inductive phases of this data analysis process and can be included as a part of the memoing process. Developing analytic questions focused on narrative elements could help the researchers develop participants’ narratives. As another example, although a traditional grounded theory study would not draw on deductive analysis processes to apply a theoretical framework as recommended in phase 5, the first phase of this analysis process – organizing the data – could be applied as a data management strategy in a grounded theory study, while phases 3 and 4 – understanding the data and interpreting the data – draw on traditional grounded theory strategies like open coding and constant comparative analysis. Further, a researcher conducting a phenomenological study typically engages in the process of horizonalization, in which participant statements are turned into meaning clusters and further developed into textural and structural descriptions, which are then pulled together to develop a description of the essence of the experience under study (Creswell and Poth, 2016). This process of horizonalization may be completed during phases four and five. Similarly, for a researcher drawing on Stake (1995) to conduct an analysis of case study data, the phases offered in this paper – particularly phases 3 through 5 – provide a more structured presentation of the stages of direct interpretation, categorical aggregation, correspondence, and naturalistic generalization. Finally, the data analysis process outlined in this article emphasizes a systematic memoing process, a key aspect of any kind of qualitative study. Although this process may not wholly apply across all methodologies, there are aspects of this analysis process that can used in conjunction with many methodologies.
As noted at the beginning of this article, rigorous, theoretically-grounded qualitative research can provide an in-depth, highly-contextualized understanding of a given policy, reform, or intervention, which helps researchers better explain associated practices and outcomes. The five-phase analysis process outlined in this article supports qualitative researchers in developing evidence-based findings to answer research questions and in explaining those findings in a theoretically-grounded manner. By engaging in a systematic, iterative process of analysis such as the one outlined in this article, researchers can better organize their data, engage inductive and deductive analysis strategies, increase the trustworthiness and rigor of their study, and balance the demands of good qualitative work.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
