Abstract
The potential use of artificial intelligence programs such as a ChatGPT to analyze qualitative data raises any number of questions, most notably whether it is possible to produce similar results without the demanding process of manual coding. In addition, there are questions about both the simplicity of using ChatGPT for qualitative data analysis and the potential time savings that it might provide This article addresses these questions by using ChatGPT to reinvestigate two qualitative datasets that were previously analyzed by more traditional methods. In particular, it examines the extent to which the responses from ChatGPT can recreate the themes that were originally chosen to summarize the two previous analyses. The results show that ChatGPT performed reasonably well, but in both cases it was less successful at locating subtle, interpretive themes, and more successful at reproducing concrete, descriptive themes. In doing so, the program was quite easy to use and required very little effort in comparison to approaches that rely on manual coding. It is important to recognize, however, that both coding and approaches based on artificial intelligence are simply tools that must be applied within a larger analytic process. Overall, this exploration suggests that artificial intelligence may well have the power to disrupt the coding of data segments as a dominant paradigm for qualitative data analysis.
Introduction
What is the potential role of artificial intelligence (AI) in the analysis of qualitative data? Until very recently, this issue was debated largely within the community of developers of qualitative data analysis software. Now, the release to the general public of programs such as ChatGPT (2023a) has brought questions about automated data analysis to the forefront. Indeed, if one believes the publicity about AI-based programs, they offer the possibility of simply “chatting” with your data to understand its complexities, rather than going through the detailed coding processes that currently dominate qualitative data analysis (e.g., ATLAS.ti, 2023).
The purpose of this article is not to describe the specifics of how ChatGPT works, but instead to treat it like a black box that can process ordinary language queries about a dataset and return a set of nuanced responses. Thus, although ChatGPT was not in any way designed as a tool for qualitative data analysis, the effort here is to evaluate its suitability for this goal. For relatively accessible accounts of how ChatGPT works, see articles in The Economist, 4/23/2023, and The New Yorker, 4/13/2023.
Prior to ChatGPT, the most extensive use of AI in qualitative data analysis was the implementation of Natural Language Processing in various versions of software packages (see overview in Silver, 2023). For example, two common features in these packages are sentiment analysis (the ability to detect positive or negative content in text segments) and semantic coding (a form of content analysis). In most cases, these features work in an “unsupervised” format where they do not require human input.
More recently, two of the major software packages for qualitative analysis, ATLAS.ti and MAXQDA, have both developed partnerships with OpenAI, the developer of ChatGPT. ATLAS.ti’s approach to AI relies on automatic coding, rather than user-based querying of the data. Automatic coding positions ChatGPT at the beginning of the analysis process, and the program’s announcement includes statements such as: “Don’t worry about endless manual coding,” and “Reduce your overall data analysis time by up to 90%” (ATLAS.ti, 2023). But it is worth noting that ATLAS.ti still treats ChatGPT as a tool for coding data, where the AI “generates numerous codes that… are subordinated to a few main categories” (Hecker, 2023). (For an experienced analyst’s critical account of using ChatGPT as a coding tool in ATLAS.ti, see Friese, 2023).
MAXQDA takes a very different approach to the implementation of AI, by using it to summarize various aspects of data. For example, once manual coding is complete, the user can apply ChatGPT to any code in any interview and get a summary of the collected data segments associated with that code. This strategy positions the role of ChatGPT nearer to the end of the analysis process, where the goal is to make sense of what was captured in the coding.
Still, what is notable about the approach to AI in both of these programs is the continued reliance on an analysis process that is centered on coding. This application of AI for coding-related purposes is quite different from first using ChatGPT to query a qualitative dataset, and then using responses from the AI as the basis for producing a set of results – in other words, using the AI alone as the basis for the analysis. To assess the potential of ChatGPT for conducting this kind of qualitative data analysis, I have undertaken a direct comparison between my own manual analyses of two qualitative datasets and the results from querying the same datasets with the free program ChatGPT (2023b), which provides an interface for the input of user data into ChatGPT version 3.5. Both of the original analyses were completed with the results expressed as a set of themes, so the most basic question is whether ChatGPT can produce a similar set of themes for each of the two datasets. To do so, I systematically queried each dataset to determine the core content that was returned by the AI. Beyond recreating themes, there are further questions about both the simplicity of using ChatGPT for qualitative data analysis and the potential time savings that it might provide.
Background
My original investigation of these two datasets arose from my broad interest in themes as a way to capture the core content in qualitative data (Morgan & Nica, 2020). I chose two unpublished datasets to present here because I had worked extensively with each of them as part of a larger, unpublished manuscript, tentatively titled: “Thinking thematically in qualitative data analysis.” These analyses originally served as demonstrations for how to apply each of two approaches to the production of themes. In particular, that earlier work analyzed one dataset using Braun and Clarke’s (2022) Reflexive Thematic Analysis, and the other using my own Iterative Thematic Inquiry (Morgan & Nica, 2020).
In terms of the comparison of the results to ChatGPT, because both of these analyses were previously completed, it was not possible to conduct a completely “naïve” assessment of the match to the corresponding datasets with ChatGPT. Still, I wanted to minimize the role of my prior knowledge in applying the AI to these datasets. Consequently, I relied on a relatively freeform set of queries (also known as “prompts”) to produce the responses from ChatGPT (as shown below), and then used straightforward follow-up questions based on those responses. This did indeed produce the basic thematic material that I desired, without using any of my substantive knowledge about the data. Note that I avoided asking the program directly about “themes” in the data in order to avoid any bias toward finding the kind of results for which I was searching. In addition, as a next stage, it would have been possible to request suggestions from ChatGPT for specific quotations related to these results, which would be vital for a full-fledged analysis, but was less relevant to my current goal of assessing the thematic content of the data.
Ultimately, I had to make a subjective determination as to whether I could reproduce the key themes that I had already located in each dataset. This lack of a formal comparison between a manual analysis and a completely separate computerized analysis is a limitation of the current article, and I will return to this issue in the Discussion and Conclusions section.
Analysis of the First Dataset: Experiences of First-Year Graduate Students
The 24 participants in these six focus groups consisted of first-year graduate students. Each participant in a group came from a different department so that they could compare and contrast their experiences. The groups were held at the end of the year, and the primary topics in the discussions concerned the challenges that these students faced during that year. The ethical approval for this research came through xx State University. The data from these groups had not received any substantive data analysis, so for demonstration purposes, I examined these data by following the six steps of Reflexive Thematic Analysis, as described in Braun and Clarke (2022), including a detailed manual coding of the data in step two.
The original results produced an overall, organizing theme: Navigating the transition to a new role: Becoming a graduate student. This over-arching theme was composed of five more basic themes: Redefined workload: Meeting increased demands Disruptions in other roles: Restriction in outside life Unclear expectations: Meeting programs’ and professors’ requirements New peer groups: The importance of the cohort Developing own interests: Setting your own expectations
The process of uploading the raw data into the ChatPDF program was quite straightforward. After signing up for an account to use this software, I created a single file which contained the data from all six focus groups. I then indicated to the program where this data was located on my computer, and it moved the data onto its own secure server.
Analyses with ChatGPT
To show the results of my analyses with ChatGPT, I will present four figures, which represent the full set of queries that I used with this dataset. I began by asking a general question about this dataset as originating in focus groups, as shown in Figure 1. Query about “main topics” in the graduate student focus groups.
From a practical standpoint, Figure 1 indicates that ChatGPT was able to reproduce the topic for the research project as a whole, but this is not what I hoped to obtain. Hence, I refined my question to ask about the topics contained in the data, as shown in Figure 2. Note that I also used ChatGPT’s formatting options to request a bulleted list. Responses to a query about “main topics” in the full dataset.
Because the topic of “keeping a balance between school and social life” came up first, I decided to query it further as shown in Figures 3 and 4. Figure 3 is notable for the level of detail that it provides, and based on my earlier in-depth analysis, this could easily serve as the initial basis for a set of sub-codes in a hierarchical system where “balance” was the higher-order code. Figure 4 shows the program’s ability to provide a more in-depth response upon request. A query asking for more information about the topic of “keeping a balance between school and social life.” A further query about the meaning of the concept “balance.”

How well did ChatGPT do at reproducing the results from the analysis using Reflexive Thematic Analysis? One notable discrepancy concerns the earlier choice of an over-arching theme based on “navigating the transition.” Rather than producing responses that directly highlighted this theme, the AI emphasized the consequences of this transition, such as maintaining a balance between school and social life, and dealing with stress. This reflects a need for a broader interpretation of the specifics located by ChatGPT. A different discrepancy involves the emphasis in the ChatGPT responses on monetary issues such as funding and work situations. In this case, this set of issues was not addressed in the original list of themes because it represented what Braun and Clarke (2022) call a topic summary, which they consider to be inadequate for use as a theme. This reflects a tendency for the ChatGPT responses to be more descriptive than interpretive.
It is also worth considering whether these two discrepancies would have been any different if this portion of the analysis process had been conducted by another human coder, rather than an AI assistant. I believe the answer is no, because both of these choices required a higher-level decision by a subjective analyst. First, the decision to designate a single, organizing theme was something that occurred quite late in the analysis process after considerable work with the five more basic themes, and there was no attempt to duplicate this effect with the AI. Second, there is good reason to reconsider whether issues related to funding and work situations should have been treated as a basic theme, based on the constraints that they placed on students’ overall ability to navigate the system. In other words, these two main discrepancies in the thematic summary were both based more on the analyst’s judgments, rather than anything inherent in ChatGPT results.
Analysis of the Second Dataset: Experiences of Dual-Earner Working Couples with Caregiving Responsibilities for Both Younger Children and Older Parents
The 73 participants in these 19 focus groups were spouses who were both employed, and who were providing care to at least one child under the age of 18 and one parent over the age of 65 (for more information on this study, see Neal & Hammer, 2007). The ethical approval for this research came through xx State University. The primary topics in the discussions concerned what made it either easier or harder for the participants to combine their various responsibilities. In the original Neal and Hammer (2007) study, these groups were primarily used to develop survey measures, and thus had received relatively little substantive attention in their own right. For demonstration purposes, I originally conducted the analysis of this dataset by following the four steps of Iterative Thematic Inquiry as described in Morgan and Nica (2020).
The results produced an overall, organizing theme of “having enough time” that was summarized in terms of meeting competing demands between work and family. This overarching theme was composed of two more basic themes: flexibility as a way of finding time, and strategizing or accommodating as a way of making time.
Analyses with ChatGPT
Here, I used what I had learned from the previous analysis to proceed directly to querying the data from the full set of 19 focus groups. Once again, I used a total of four basic queries, along with two further queries not shown here. Figure 5 shows the results from asking for a brief summary of the core content in the data, while Figure 6 asks for a longer response to the same question. Responses to a query for a brief summary of things that made it easier or harder for these participants. Responses to a query for a longer answer about things that made it easier or harder for these participants.

Figure 7 shows a request for more detail about one of the points from the response shown in Figure 5. Note that I did not refer to either “themes” or “sub-themes” in theses queries, because I reserved that language for the later, more interpretive phase of the analysis. Instead, at this point in the process I kept the queries as specific as possible. Responses to a query about one of the key themes from the original analysis.
Figure 8 takes things a step further, by asking directly about one of the themes that was identified in the earlier analysis. Interestingly, this response emphasizes the importance of accommodation, which was closely related to strategizing as a theme in the original analysis. Further queries about accommodation, as well as the additional theme of flexibility, gave equally relevant responses, including useful illustrations of how these concepts operated in the data. Responses to a query about the relationship between two topics.
Overall, when compared to the analyses for the previous dataset, ChatGPT did a similar job of capturing the key themes. Thus, it did not highlight the original, overall theme of demands on time. At the same time, it did emphasize several specific elements of the data that corresponded to basic, descriptive themes from the original analysis. In addition, as key themes were identified in the earlier analysis, further queries produced highly relevant responses. This suggests the value of ChatGPT for following up on insights into the data that occur during the analysis process. As discussed below, however, this points to a strategy of moving toward increasingly more specific aspects of the data, rather than using the results from the AI to generate broader, more general summaries.
Discussion and Conclusions
Limitations
As noted earlier, one of the key limitations of this exercise was its post-hoc nature, such that my manual analysis of the two datasets was completed earlier, and then compared to the responses to my AI queries. A considerably stronger design would be to compare the results from two separate teams that each analyzed the same dataset, with one team relying exclusively on typical coding procedures and the other relying entirely on a program such as ChatGPT. This represents an important direction for future research.
A different set of limitations that needs to be discussed concerns ChatGPT itself, and its potential to return biased or even nonsensical responses. The first of these problems, bias, occurs because of potential limits in the AI’s training set. In this case, ChatGPT was trained on a large subset of the text available on the Internet, and to the extent that this text base is biased by factors such as race, gender, or sexuality, there is a concern that this bias will also appear in the responses the program produces. For a discussion of the issues of biases, see OpenAI (2023), where the developers of ChatGPT describe their considerable efforts to train the program to detect these kinds of biases and to exclude them from both the queries it accepts and the responses it provides.
The issue of potential bias in the qualitative data analysis is somewhat different, however, because the data being queried is supplied by the researcher, and the biases produced in this case would be biases that were present in the original dataset. For example, if the statements from the participants contain racist or sexist language and assumptions, then this may well be reproduced in the responses from the AI. One way to avoid such biases is to rely on queries that essentially ask, “What do these participants say about…,” rather than asking for more interpretive speculations (a topic that I will return to below). Note, however, that the attempts by ChatGPT’s developers to prevent bias could be problematic for qualitative researchers pursuing topics directly related to race, gender, sexuality, etc.
A second limitation for ChatGPT is its ability to produce nonsensical responses, a process which is sometimes described as “hallucination” (Lakshmanan, 2022). This happens because each response depends on the continuing context in which the program has been working, where that context consists of both the previous queries and the content of the responses to those queries. In essence, the software is continually trying to predict what comes next, which means that one misstep in its responses can be magnified in subsequent responses.
This problem can be minimized by recognizing that ChatGPT, again, operates solely on the current data, so the analyst’s own knowledge of the data is the key to detecting nonsense responses. The importance of this issue is indicated in Reflexive Thematic Analysis (Braun & Clarke, 2022) by its first step, familiarization with the data, which requires a careful reading of the complete dataset. The other analysis method considered here (Morgan & Nica, 2020), takes a different approach to this task by beginning with a formal “assessment” phase, where the analyst systematically reflects on all the potential themes associated with the research questions. Both of these approaches put an emphasis on knowing one’s data, which should greatly reduce nonsensical responses by limiting the context in which the AI operates to a well-defined target dataset and a consistent set of researcher’s queries.
A third problem concerns ethical issues about the reuse of datasets in further iterations of ChatGPT or its interfaces such as the ChatPDF program that was used here. Specifically, do these datasets become open for use by such programs in their further training, and if so, does this put the privacy of the participants at risk? The rather dissatisfying answer to this question is that none of these AI programs release any public information about the inputs to their training. On the one hand this means that unless the data are sufficiently anonymized, the reuse of the participants’ data could threaten their privacy. On the other hand, there is no evidence that the content of ChatGPT or any other similar AI has ever been threatened. Hence, the protection of privacy remains secure in practice, although it is possible to imagine scenarios in which this might be violated.
In the current analyses, there was no evidence of problems with biases, nonsensical responses, or privacy. Still, the occurrence of these problems in the use of ChatGPT remains unknown, pending additional real-world testing. Even so, it seems fair to admit that AI in general and ChatGPT in particular may never be perfect tools for doing qualitative analysis. What then are the lessons for its most effective use, given the experiences that I have summarized here?
The Comparison to Coding
With regard to the reproduction of the themes from the earlier analyses of the two datasets, ChatGPT showed a clear tendency to emphasize more specific aspects of the data, without pointing to the bigger picture that united these specifics. Consequently, the application of a tool such as ChatGPT should not be “taken by itself.” Instead, it is better to see the responses that it produces as raw material for the more interpretive aspects of qualitative data analysis. One useful way to think about ChatGPT as a tool for producing inputs to the interpretive aspects of qualitative data analysis is to compare it with coding as a tool for the same purposes, and in particular to manual coding with the marking of text segments according to their relevant content. Like ChatGPT, this kind of coding by itself does not produce the results from qualitive research without further efforts by the analyst.
It is thus relevant to ask whether ChatGPT leads to a loss in quality for the analysis when compared to analyses based on this kind of detailed coding. If one simply stopped with a summary of the responses from ChatGPT, there would certainly be a loss of quality compared to a more interpretive analysis, but this problem would be the same or worse if one stopped after compiling nothing more than a set of codes. In essence, both of these stopping points represent an intermediate stage in the analysis that processes the raw data into a more organized set of inputs for the later, interpretive portions of the analysis – a step that Locke et al., (2022) call “putting patterns together” and “progression toward theorizing.” Ultimately there is no substitute for knowing one’s data and using that knowledge in the inherently subjective process of making meaning from that data. Tools such as ChatGPT and manual coding are merely means to this end —which raises questions about the efficiency and effectiveness of these tools.
In this regard, along with the reproducibility of my original analyses, I mentioned two other potential issues in the introduction: the simplicity of using ChatGPT for qualitative data analysis and the time savings that it might provide. My experiences in both these regards are quite positive. In terms of ease of use, the queries and responses shown in the various figures demonstrate how little effort it takes to produce useful analytic material. In terms of time savings, the inevitable comparison is to manual coding of the data, and the outcome of that comparison greatly favors ChatGPT. More explicitly, the earlier manual coding of the graduate student dataset used Reflexive Thematic Analysis (Braun & Clarke, 2022) which took 23 hours, whereas the analysis of that dataset with ChatGPT shown here took approximately 2 hours. (The equivalent information is not available for the second study because the methods used there, Interpretive Thematic Inquiry does not involve coding).
This comparison between using responses from ChatGPT and coding as tools for qualitative data analysis is worth pursuing. For example, there is the issue of the extent to which ChatGPT should be treated as an alternative to manual coding, as conceived by ATLAS.ti. In her comparison of ChatGPT-based coding in ATLAS.ti to her own manual coding, Friese (2023) concluded that ChatGPT was neither more efficient nor more effective. Note, however, that Friese did not attempt to reach any conclusions with either of her analyses; rather, she treated them as a head-to-head comparison for two techniques for coding the data. A less radical alternative is to use ChatGPT for an informal version of coding, rather than explicitly marking segments of text (e.g., Hitch, 2023). In addition, my own analyses, as shown in Figures 3 and 6, point to the potential of using ChatGPT as an entry point to the construction of a codebook.
Yet, what I found reverses a common process for generating a codebook, which Saldaña (2021) describes as following two cycles, with the first staying closer to the original data and the second working with that initial set of codes to locate more conceptual patterns in the data. This two-cycle process is present in Reflexive Thematic Analysis (Braun & Clarke, 2022), Interpretive Phenomenological Analysis (Smith et al., 2009) and all versions of Grounded Theory (e.g., Charmaz, 2014). In contrast, as Figures 3 and 6 demonstrate, ChatGPT leads the analyst to an opposite progression, where broader concepts that appeared earlier can subsequently be queried further to generate more specific content. This suggests that analyses based on ChatGPT might be a poor fit for intensely inductive approaches that move from marking the raw data to the eventual creation of themes and conceptual categories.
The current approach leaves open the question of whether ChatGPT could be used for more interpretive analyses. Here, I have taken the position that the role of software is to help locate the basic concepts in the data, while it remains up to the researcher to organize that material into a more meaningful account of the participants’ experiences and perspectives. In other words, I have treated both ChatGPT and more traditional qualitative software programs as assistants in the earlier phases of the interpretive process. This means the kinds of queries I used with ChatGPT were intended to discover what was in the data, rather than to ask for speculations about the deeper patterns in the data, let alone what the sources were for such patterns. Given those choices, the issue of how to do more interpretive qualitative analyses with either AI in general or ChatGPT more specifically remains a topic for further research.
Lurking behind these specific concerns about coding is a larger question: Does ChatGPT represent an entirely new approach or paradigm for qualitative data analysis? To answer this question, it helps to begin with a definition of “paradigm.” In my case (Morgan, 2007), I follow Thomas Kuhn’s (1996) original formulation of paradigms as dominant worldviews in a research community, with those worldviews determining what the community sees as both meaningful research goals and acceptable ways of addressing those goals. For qualitative researchers as a research community, the question of how to analyze qualitative data has relied on coding as the most acceptable way to address qualitative analysis. Indeed, it might not be too extreme to say that coding has been the dominant paradigm in qualitative data analysis for at least the past 30 years. For example, Tesch (1990) found that this dominance was already well-established by the late 1980s.
So, if manual coding of data segments has been the dominant paradigm for qualitative analysis, then yes, I believe that querying with ChatGPT does threaten that dominance. It helps to remember, however, that both querying and coding are merely tools that need to be employed within systematic frameworks in order to produce coherent results. For coding, those frameworks include grounded theory (e.g., Charmaz, 2014) and framework analysis (Ritchie & Spencer, 1994), as well as Braun and Clarke’s reflexive thematic analysis (2022). Such broader frameworks are currently missing for AI-based querying, which is hardly surprising, given not just its newness but also the thorough domination of qualitative data analysis by approaches based on coding. One analysis framework that does not rely on coding is my own Iterative Thematic Inquiry, and another is that of Maietta et al. (2021), but how well ChatGPT fits either of those approaches is yet to be seen.
Overall, the current exploration indicates a substantial potential for the contribution of AI-based tools such as ChatGPT in qualitative data analysis, so long as one recognizes that these programs are tools that need to be employed properly to produce meaningful results. The integration of these tools into qualitative data analysis may not be smooth, however. On the one hand, ChatGPT might serve as a useful assistant in the traditional coding process; on the other hand, relying on queries from ChatGPT could challenge the dominance of coding as a paradigm for analyzing qualitative data. Only time will tell what direction AI-based analysis will take, but given the disruptive power of ChatGPT, that time may be quite short.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
