Sage Journals: Discover world-class research

Abstract

This study investigates the efficiency and thematic completeness of manual versus artificial intelligence (AI)-assisted qualitative analysis of nurses’ insights into the recovery process for patients who have had emergency abdominal surgery (EAS), specifically understanding barriers and facilitators to recovery. EAS is associated with significant mortality and complications, and nurses play a crucial role in patient recovery. Extracting actionable insights from qualitative data is labor-intensive, with traditional methods requiring extensive person-hours and being susceptible to inter-coder drift. Rapid qualitative analysis (RQA) offers a streamlined approach, and commercial large language models (LLMs) may accelerate and augment qualitative research, promising semi-automated data coding and synthesis while maintaining human oversight. This study systematically compared four analytic workflows: manual RQA, ChatGPT-o3 analysis of human-generated summary notes, Microsoft Copilot GPT-4 analysis of cleaned transcripts, and ChatGPT-o3 analysis of cleaned transcripts. Manual RQA took approximately 30 person-hours, while each of the LLM-assisted methods ranged from four to eight person-hours, demonstrating significant time savings. The LLM-generated codebooks captured nearly all concepts and were highly rated by nurse participants for thoroughness, action-orientation, and relevance to their roles. The findings suggest that a pragmatic hybrid approach, using human note-taking, AI analysis, and human review, offers an effective balance of speed, ethical AI use, and rigorous qualitative analysis for implementation science. While LLMs reduced labor and provided comprehensive thematic coverage, human input and analysis remains vital to support qualitative rigor.

Keywords

large language models (LLMs)implementation science rapid qualitative analysis (RQA)nursing emergency abdominal surgery (EAS)artificial intelligence (AI)AI-assisted analysis

1. Introduction

Emergency abdominal surgery (EAS) carries significant morbidity and mortality associated with high major complication rates and worsening quality of life (Aggarwal et al., 2019; Lau et al., 2024; Tengberg et al., 2017). A 2024 systematic review showed that physical and psychosocial recovery is often incomplete for 3–12 months after EAS, underscoring the mismatch between a “successful” operation and lived postoperative outcomes (Khanderia et al., 2024). Best-practice paradigms for Enhanced Recovery After Surgery (ERAS) consistently assigns nurses a frontline role in education prior to surgery, recognizing complications early, needs-based tailored patient education, and documenting nursing-specific barriers to pathway adoption (Brown & Xhaja, 2018).

Digital health interventions offer a promising avenue to support patients after EAS. A recent Lancet Digital Health review found 126 digital interventions for postoperative monitoring. However, 80% of those interventions were still in development and “none had reached [large-scale] implementation” (McLean et al., 2023). This highlights a significant gap in real-world evaluation and underscores the need for robust implementation science solutions to translate these promising digital health tools into effective clinical practice for EAS patients.

A crucial foundational step in the digital health design-to-intervention pipeline, particularly within implementation science, is extracting actionable insights from qualitative data (Brown et al., 2025; Palinkas & Zatzick, 2019). Traditional methods for qualitative analysis include transcription, data cleaning, codebook creation, and thematic analysis (Brown et al., 2025). Traditional thematic analysis typically requires iterative line-by-line coding by multiple researchers.

In fast-moving healthcare environments, especially in the context of implementation science where adaptive interventions are key, timely qualitative findings are vital to achieving actionable results (Brown et al., 2025; Palinkas & Zatzick, 2019). Epistemologically, this study adopts a pragmatic stance. We are not seeking a single objective truth, nor are we engaging in deep constructivist meaning-making. Instead, our goal is utility: identifying actionable clinical barriers rapidly to improve patient care. To bridge the gap between evidence and practice, researchers have increasingly turned to accelerated approaches that preserve analytic rigor while reducing turnaround time (Vindrola-Padros & Johnson, 2020). While accelerated approaches such as Framework Analysis, Applied Thematic Mapping, or Agile Thematic Synthesis offer rigorous structures for analysis, they often entail resource-intensive cycles that can delay the translation of evidence into practice (Gale et al., 2013; Guest et al., 2012; Riley et al., 2013). RQA methods streamline or eliminate transcription; condense data management; and use structured templates, summary matrices, and/or direct audio processing to accelerate analysis (Brown et al., 2025). Such approaches are well-suited to real-world/implementation research, where timely feedback is critical to adapting interventions (Brown et al., 2025; Palinkas & Zatzick, 2019).

Large language models (LLMs) are increasingly used to augment, rather than replace, qualitative research, especially with thematic analysis (Brondani et al., 2024; Feuston & Brubaker, 2021; Hitch, 2024; Marshall & Naff, 2024; Morgan, 2023; Rodrigues Dos Anjos et al., 2024). The strengths of their use lie in processing large datasets quickly and helping surface themes that may otherwise be missed (Bano et al., 2024; Feuston & Brubaker, 2021; Hitch, 2024). Generative AI tools can accelerate both deductive (Xiao et al., 2023) and inductive (Perkins & Roe, 2024) coding, but most approaches still keep a human-in-the-loop to ensure analytic rigor (De Paoli, 2024; Perkins & Roe, 2024).

Newer LLMs can generate codes and synthesize themes at unprecedented speed, an advantage for implementation scientists who need rapid insights. However, the evidence base is still emerging. Benchmarks show wide variability in agreement with human-coded themes (38–85%) and highlight risks such as prompt sensitivity, hallucinated categories, biased outputs, and opaque decision-making (Castellanos et al., 2025; Chubb, 2023; Kondo et al., 2024; Morse et al., 2025; Sakaguchi et al., 2025; Wachinger et al., 2025). Without robust prompt engineering including iterative prompting and contextual scaffolding, LLMs can produce shallow or imprecise themes (Morse et al., 2025; Sakaguchi et al., 2025). Some studies omit quantitative agreement measures altogether or rely on custom metrics, limiting cross-study comparability (Brondani et al., 2024; Castellanos et al., 2025; Kondo et al., 2024; Sakaguchi et al., 2025; Wachinger et al., 2025). This indicates a need for tools to quantify agreement, allowing for true benchmark comparisons. Thus, there is a need for consistent benchmarking tools and deeper evaluation of when and how LLMs can support rigorous qualitative analysis in health informatics (Bergman et al., 2024). Together, these concerns point to the need for systematic evaluations of LLM performance, focusing on efficiency, thematic completeness, and ethical integrity in qualitative health informatics research.

To capture nurses’ insights into the recovery process for EAS patients, we conducted qualitative sessions designed to explore barriers and facilitators to optimal recovery. The rich data corpus yielded from these discussions served as a robust test-bed for evaluating various qualitative analysis approaches. This study systematically compared four distinct analytic workflows: a traditional manual rapid qualitative analysis and three innovative AI-assisted variants. We then analyzed the efficiency and thematic completeness of each approach.

2. Methods

2.1. Design, Setting, and Participants

To capture nurses’ insights into the recovery process for EAS patients, we conducted six 60-minute Zoom sessions (16 participants total) with registered nurses and/or nurse practitioners recruited by email from hospitals in California and Minnesota. All participants had experience caring for patients with abdominal surgery, with their scope of practice ranging from direct patient care to care coordination.

Semi-structured interview guides were used to explore barriers and facilitators to optimal EAS recovery both in-hospital and after discharge. Sessions were facilitated by an experienced moderator and recorded in Zoom. Institutional Review Board approval was obtained from both clinical sites and all participants provided verbal consent. Participants received a $100 incentive for participating in these Zoom interview sessions, and an additional $20 for completing the follow-up member-checking questionnaire.

2.2. Data Sources and Analytic Workflows

Each session produced two complementary data sources:

1. Summary notes: Free-text notes drafted by team members as they viewed the sessions.

2. Verbatim transcripts: Automated speech-to-text transcripts generated by Zoom.

The goal of this work is to compare the results of four pathways for analyzing the qualitative data from these sessions (Figure 1 and Table 1). Both the summary notes and verbatim transcripts served as the raw material for the manual RQA as well as the three AI-assisted workflows described in detail below. Additionally, AI was utilized to merge the resulting four baseline codebooks into a single reference codebook, which served as the basis for assessing thematic cross-coverage across the different workflows.

Figure 1.

Comparison of manual and AI-assisted qualitative analysis workflows

Table 1.

Characteristics of Analytic Workflows

Workflow	Input data	Human effort	Large-language-model(LLM) involvement	Output
W1 – Manual RQA	Summary notes	Three team members	None	Manual codebook
W2 – ChatGPT-o3 + Summary notes	Summary notes	Three team members	ChatGPT-o3	LLM-generated codebook
W3 – Copilot GPT-4 + cleaned transcripts	Transcripts	One team member	Microsoft Copilot GPT-4	LLM-generated codebook
W4 – ChatGPT-o3 + cleaned transcripts	Transcripts	One team member	ChatGPT-o3	LLM-generated codebook

2.3. Manual Rapid Qualitative Analysis (W1)

Three team members (including two nurses), all with experience in qualitative research and analysis, reviewed the six audio recordings by listening in full and capturing detailed notes. Each team member built an initial list of barriers and facilitators emerging from the sessions they reviewed. The team members then met to reconcile terminology and clarify concepts within their own reviews. Guided by that discussion, each team member reorganized the material from their sessions into preliminary themes and subthemes. Further discussion resolved remaining discrepancies and produced the final codebook of agreed-upon themes, subthemes, and definitions that covered all sessions.

2.4. ChatGPT-o3 Analysis of Summary Notes (W2)

A researcher uploaded all deidentified summary notes (described above) as attached files into the ChatGPT-o3 interface (March 2025 release) with the prompt below. For simplicity, we used the term ‘focus groups’ throughout the prompts:

The attached documents contain raw notes from focus groups conducted with nurses who care for patients who have undergone emergency abdominal surgery. I want you to take on the role of a qualitative analysis expert, and conduct two tasks. We will be using this analysis to develop interventions to help these patients. The two tasks are as follows: 1. Organize the notes: Please merge and organize the notes. Format the notes for readability and output it as a new, structured file (.docx). 2. Codebook Creation: We intend to conduct a qualitative thematic analysis of the notes. Please analyze the notes to identify key themes and subthemes, ensuring that they capture the main topics discussed by the participants. Create a structured codebook that includes: a. A list of themes and subthemes b. Definitions for each theme and subtheme

We noticed that ChatGPT-o3 generated exactly the same number of subthemes within each theme, so we followed up with this prompt:

I see you put 4 subthemes for each theme. Reanalyze the notes and consider whether this is the right number. It would be fine to have more subthemes in some categories than others.

ChatGPT-o3 then generated a revised structured codebook.

2.5. Cleaning and Pre-processing of Transcripts (W3 and W4)

Institutional policy required that raw participant data be processed only with a university-approved LLM (Microsoft Copilot GPT-4). Because the University’s version of Copilot could not accept file uploads and enforced an 8,000-character limit, we chunked the deidentified text into ∼8,000-character segments and cleaned each transcript by submitting each ∼8,000-character segment into Copilot with the following prompt:

The text below contains raw transcripts from a focus group conducted with nurses who care for patients who have undergone emergency abdominal surgery. Please clean the transcripts by removing irrelevant content (e.g., timestamps, filler or nonsense words) while preserving the meaning and accuracy of the discussions. Format the cleaned transcript for readability.

2.6. LLM Analysis of Cleaned Transcripts (W3 and W4)

After cleaning the transcripts, we fed each fully cleaned transcript into Microsoft Copilot GPT-4 (for W3) and ChatGPT-o3 (for W4) using the following prompt:

We intend to conduct a qualitative thematic analysis of the transcript. Please review all of the work we did and identify key themes and subthemes, ensuring that they capture the main topics discussed by the participants. Create a structured codebook that includes: A list of themes and subthemes; Definitions for each theme and subtheme

We then entered the following prompt into both Copilot and ChatGPT-o3:

Following are qualitative codebooks identified from each of 6 sessions with nurses caring for patients who had emergency abdominal surgery. Analyze these codebooks for overlap and create a single codebook of barriers to recovery that encompasses these themes and subthemes.

Copilot GPT-4 and ChatGPT-o3 each generated a single codebook.

2.7. Evaluation of the Codebooks

We employed two approaches to evaluate the codebooks: 1) checking their coverage against a synthesized codebook, and 2) gathering feedback from nurse participants.

2.7.1. Creation of Synthesized Codebook

Once we had codebooks for our four workflows, we entered the following prompt into ChatGPT-o3:

I have 4 different codebooks created for these focus groups that I can provide to you. I would like you to analyze the alignment between them.

ChatGPT-o3 responded by asking for clarification on format, content, desired depth of comparison, output format, and any exclusions or special rules. Our responses specified:

I will paste them as text. They do contain definitions. I want a high-level map of which major themes and subthemes overlap in concept... A single consolidated table (themes × codebooks) plus a narrative summary work is great. I do also want recommendations for harmonizing or merging the four codebooks... Include positive-outcome codes for now. I want closely related codes be merged.

ChatGPT-o3 generated the synthesized codebook.

2.7.2. Nurse Participant Feedback on the Codebooks

We invited the 16 nurse participants to provide feedback on the four codebooks, with data collected and managed using the secure, web-based software platform REDCap (Research Electronic Data Capture) hosted at the University of Minnesota (Harris et al., 2009). The participants rated how well each codebook reflected their perspective as a nurse using a Likert scale (Not at all, Slightly, Somewhat, Mostly, and Completely). They also provided open-ended responses on what each codebook got right and wrong, an overall ranking of the codebooks, and what appealed to them about their top choice. Participants received a $20 incentive for completing the follow-up survey.

3. Results

The manual (human) rapid qualitative analysis (W1) took approximately 30 person-hours to complete and resulted in 3 themes and 10 subthemes, shown in Table 2.

Table 2.

Manual RQA (W1) Codebook

Theme	Sub-theme	Definition
1. Communication	1.1 Language & Literacy Barriers	Need for interpreters/translation and the challenges patients face understanding health information.
	1.2 Miscommunication	Situations where providers and patients are “not on the same page;” asynchronous or unclear exchanges that negatively affect care.
	1.3 Communication Tools	Technologies or resources (e.g., translation software, patient portals) that facilitate clearer information exchange.
	1.4 Positive Communication	Examples of timely, accurate, and empathic dialogue that improves patient experience and outcomes.
2. Education	2.1 Education Continuum	Timing, sequencing, and method of delivering teaching across the care journey (pre-op → discharge → home).
2. Education	2.2 Volume of Topics	The breadth and amount of content provided; risk of overwhelming patients or caregivers.
3. Care Coordination	3.1 People Facilitators	Key individuals (case managers, navigators, social workers) who organize and streamline care.
	3.2 Service Facilitators	Home-health, outpatient rehab, transportation, or other formal services that support recovery.
	3.3 Resource Constraints	Geographic distance, financial barriers, supply shortages, or follow-up limitations that hinder optimal care.
	3.4 Care Complications	Medical or logistical problems that emerge during the episode of care and complicate recovery.

The ChatGPT-o3 analysis of notes (W2) took approximately 8 person-hours to complete and resulted in 10 themes and 47 subthemes, shown in Table 3.

Table 3.

ChatGPT o3 Analysis of Notes (W2) Codebook

Theme	Sub-theme	Definition
1. Patient Education & Health Literacy	1.1 Discharge instructions	Clarity, completeness, and readability (print + verbal) of instructions provided at hospital exit.
	1.2 Pre-operative education	Presence/absence and adequacy of teaching before emergent surgery; how lack of prep affects anxiety and recovery.
	1.3 Language & translation services	Access to professional interpreters and fully translated materials; gaps during nights/weekends or for rare dialects.
	1.4 Health-literacy adaptation	Use of plain language, pictograms, staging information over time, and tailoring content to patients’ education level.
	1.5 Teach-back & reinforcement	Techniques (phone call, bedside return-demo) used to confirm patient/caregiver understanding after initial teaching.
2. Care Coordination & Continuity	2.1 Early case-management involvement	Starting discharge and resource planning on or near admission (“Day 0”).
	2.2 Multidisciplinary alignment	Consistency of information across surgical, nursing, ostomy, social-work, nutrition, PT/OT teams.
	2.3 Team/role clarity for patients	How well patients know who their surgeon, resident, NP, and case manager are and who to contact.
	2.4 Follow-up scheduling	Ensuring timely in-person or telehealth appointments and bundling multi-specialty visits for remote patients.
	2.5 Cross-setting hand-offs	Quality of information transfer between hospital, home-health, TCU/SNF, and PCPs.
3. Support Systems & Social Resources	3.1 Family/caregiver engagement	Involving support persons in teaching, decision-making, and ongoing care activities.
	3.2 Transportation access	Distance, availability of rides, payer-provided transport, and travel vouchers.
	3.3 Home-health & TCU availability	Capacity shortages, rural gaps, and post-COVID workforce attrition affecting visiting services or transitional-care beds.
	3.4 Insurance & financial barriers	Coverage denials or high co-pays for supplies, equipment, or placement; strategies patients use to cope.
	3.5 Social-work & community referrals	Activation (or absence) of social work, chaplain, peer-support groups to address socioeconomic needs.
4. Pain & Symptom Management	4.1 Analgesia logistics	Overnight pain-team access, timing of medication administration, and prescription fulfilment hurdles.
	4.2 Mobility impact	Ways uncontrolled pain limits ambulation/PT participation, prolonging LOS.
	4.3 Drain & ostomy output tracking	Patient competence in measuring, interpreting, and documenting outputs; thresholds prompting calls.
	4.4 GI & wound symptom escalation	Teaching red-flag combinations (fever + N/V, output changes, erythema) and pathways for urgent vs. routine queries.
5. Equipment & Supplies	5.1 Ostomy starter kits & ordering	Initial supply quantity, instructions for re-ordering, and set-up with suppliers.
	5.2 Wound-care materials	Dressings, VACs, and access to negative-pressure systems at home or TCU.
	5.3 Specialized devices (TPN, tube feeds)	Availability, insurance approval, and caregiver training for complex equipment.
	5.4 Supply shortages & logistics	Weekend/holiday gaps, incorrect supplier information, rural delivery delays.
	5.5 Financial coping for supplies	Out-of-pocket purchases, rationing, or reuse when insurance limits coverage.
6. Mobility, Nutrition & Physical Recovery	6.1 Early ambulation routines	Frequency and documentation of walk sessions; staffing/time constraints.
	6.2 PT/OT integration & home-barrier assessment	Timing of consults and evaluation of stairs, elevators, or assistive-device needs at the patient’s residence.
	6.3 Diet progression & ileus prevention	Education on advancing diet, monitoring for bloating/obstruction, and physician alignment on diet orders.
	6.4 Nutrition optimization	Early dietitian involvement, protein/calorie goals, supplements, and tube-feed guidance.
	6.5 Sleep & fatigue management	Night-shift clustering of care, promoting rest, and impact of sleep deprivation on healing.
7. Emotional & Psychological Support	7.1 Adjusting to a new ostomy (“new normal”)	Emotional reactions (grief, disgust) and coping strategies for unexpected body changes.
	7.2 Fear of readmission or perceived failure	Confidence-building, reframing complications as part of recovery, and self-efficacy encouragement.
	7.3 Spiritual & cultural care	Chaplain visits, culturally concordant support, or rituals impacting acceptance and healing.
	7.4 Mental-health referral & counseling	Screening for anxiety/depression/PTSD and connecting to behavioral-health resources.
8. System & Staffing Constraints	8.1 Nurse-patient ratios & workload	Impact of staffing on education time, ambulation, and response to patient needs.
	8.2 Interpreter availability on off-shifts	Difficulty securing live interpreters for rare languages/dialects during nights/weekends.
	8.3 Knowledge drain & resource cuts post-COVID	Loss of experienced educators, merged units, or eliminated patient-education centers.
	8.4 Length-of-stay (LOS) reduction pressure	Institutional push for early discharge that outpaces patient readiness and resource coordination.
9. Technology-Enabled Communication	9.1 MyChart messaging & digital literacy	Patient ability to send photos/questions, expectations for response time, and staff workload.
	9.2 After-hours triage lines & direct numbers	Clarity of whom to call when; differences between sites in responsiveness.
	9.3 Telehealth follow-up & video visits	Usefulness for rural/immobile patients, barriers to adoption, and hardware/internet issues.
	9.4 EHR vs. paper instruction consistency	Alignment (or mismatch) between electronic discharge summaries and paper packets.
	9.5 Remote monitoring tools (photos, logs)	Patients submitting incision photos or output diaries and how the team reviews/acts on them.
10. Post-discharge Monitoring & Triggers	10.1 Red-flag symptom lists	Fever, bleeding, N/V, output extremes, or pain escalation that require immediate contact.
	10.2 Follow-up phone calls	Next-day or 48-h nurse outreach to identify early complications and reinforce instructions.
	10.3 Teach-back verification	Structured checks that patients can articulate warning signs and self-care steps.
	10.4 Readmission patterns & preventable causes	Common reasons (dehydration, wound infection, uncontrolled pain) and lessons learned.
	10.5 Symptom-tracking diaries or apps	Use of logs for output, pain scores, or fluid intake to guide early intervention.

The Copilot analysis of 6 sets of themes/subthemes (W3) took approximately 4 person-hours to complete and resulted in 7 themes and 19 subthemes, shown in Table 4.

Table 4.

Copilot Analysis of 6 Sets of Themes/Subthemes (W3) Codebook

Theme	Subtheme	Definition
1. Communication Barriers	1.1 Language Barriers	Challenges in communication arising from language differences that limit understanding between patients and providers.
	1.2 Health Literacy	The patient’s ability to comprehend and act on health information delivered by the care team.
	1.3 Follow-Up Communication	Ongoing information exchange between providers and patients after discharge.
2. Care Coordination Challenges	2.1 Discharge Planning	Problems organizing the patient’s transition from hospital to home, including scheduling and instructions.
	2.2 Role of Case Managers	How case managers participate in planning and coordinating patient care from admission through discharge.
	2.3 Multidisciplinary Team Involvement	Collaboration among surgeons, nurses, therapists, and other professionals to deliver comprehensive care.
	2.4 Handoff and Care Coordination	Transfer of critical patient information between shifts and services to maintain continuity of care plans.
3. Patient Support System Barriers	3.1 Family & Caregiver Involvement	Contributions and limitations of family members or informal caregivers in the recovery process.
	3.2 Professional Support	Participation of home-care nurses, therapists, or other healthcare professionals in patient recovery.
	3.3 Transportation & Accessibility	The patient’s ability to reach clinics, pharmacies, or other resources after discharge.
4. Post-Discharge Challenges	4.1 Managing Equipment & Medications	Difficulties handling medical devices (e.g., drains) and adhering to medication regimens at home.
	4.2 Recognizing & Responding to Complications	Patient capacity to identify warning signs and seek timely help for post-surgical problems.
	4.3 Logistical Challenges	Practical issues such as arranging home-care visits, securing supplies, or coordinating services.
5. Emotional & Psychological Barriers	5.1 Emotional Support	Provision of counseling, chaplaincy, or mental-health resources to address anxiety, depression, or distress.
5. Emotional & Psychological Barriers	5.2 Family Involvement	Engagement of family members in emotional support and shared decision-making about care.
6. Financial & Insurance Barriers	6.1 Insurance & Financial Issues	Coverage gaps, co-pays, or other monetary constraints that hinder access to care and supplies.
6. Financial & Insurance Barriers	6.2 Rural & Remote Challenges	Additional obstacles for patients in rural or remote areas, such as limited services or long travel distances.
7. Patient Adherence & Compliance	7.1 Adherence to Medical Recommendations	The patient’s consistency in following prescribed medications, activities, and self-care instructions.
7. Patient Adherence & Compliance	7.2 Social & Environmental Factors	External influences such as housing, social support, work demands that affect a patient’s ability to adhere and recover.

The ChatGPT-o3 analysis of 6 sets of themes/subthemes (W4) took approximately 4 person-hours to complete and resulted in 6 themes and 23 subthemes, shown in Table 5.

Table 5.

ChatGPT-o3 Analysis of 6 Sets of Themes/Subthemes (W4) Codebook

Theme	Subtheme	Definition
1. Health Literacy & Patient Education	1.1 Pre-surgery Education & Expectations	Adequacy and timing of information given before surgery about procedures, risks, ostomies, pain, and expected recovery trajectory.
	1.2 Discharge Instructions & Readiness	Clarity, completeness, and patient confidence in performing post-discharge care tasks at the moment of hospital release.
	1.3 Post-surgery Self-management Education	Teaching on medication schedules, wound/ostomy care, equipment use (e.g., drains, compression devices) and safe activity progression after discharge.
	1.4 Language & Cultural Barriers	Any mismatch between the patient’s preferred language/culture and clinical communication that limits understanding.
	1.5 Recognizing Warning-Signs	Patient ability to identify complications (infection, leakage, escalating pain) and know when/how to seek help.
2. Communication & Care Coordination	2.1 Interdisciplinary Team Communication	Effectiveness of information exchange among surgeons, nurses, case-managers, pharmacists, physical therapists, etc.
	2.2 Nurse–Patient Communication	Day-to-day dialogue quality, trust, and clarity between bedside nurses and patients.
	2.3 Discharge Planning & Handoffs	Quality and consistency of plans, documentation and verbal handoffs from hospital to home/community services.
	2.4 Follow-up Channels	Availability and usability of tools such as MyChart, phone triage, telehealth, and clear guidance for urgent versus non-urgent contact.
	2.5 Care Navigation/Case Management	Access to and effectiveness of personnel who arrange services, supplies, appointments and insurance authorizations.
3. Pain & Symptom Control	3.1 Immediate Post-Operative Pain Control	Adequacy of analgesia and non-pharmacologic strategies on the surgical unit.
	3.2 Ongoing Pain Management at Home	Patient access to prescriptions, clear taper plans, and alternatives once home.
	3.3 Symptom Management & Comfort	Support for nausea, sleep disruption, bowel function, and wound discomfort that influence recovery speed.
4. Support Systems & Psychosocial Resources	4.1 Family & Informal Caregiver Involvement	Availability, preparedness, and burden of relatives/friends who assist with care.
	4.2 Professional Home-Health & Community Services	Timeliness and adequacy of visiting-nurse, PT/OT, or social-service support.
	4.3 Emotional & Psychological Support	Resources addressing anxiety, depression, body-image (e.g., new ostomy), and coping (e.g., chaplaincy, mental-health referral).
5. Access, Logistics & Social Determinants	5.1 Insurance & Financial Constraints	Coverage gaps, co-pays, authorization delays, and out-of-pocket costs for medications, supplies, or services.
	5.2 Transportation & Geographic Barriers	Distance, lack of transport, or rural/remote location limiting in-person care or follow-up.
	5.3 Equipment & Supply Availability	Delays or shortages in obtaining ostomy supplies, wound vacs, mobility aids, etc.
	5.4 Home Healthcare Placement	Difficulty securing appropriate skilled-nursing or rehab placement when needed.
6. Patient Engagement & Adherence	6.1 Medication & Treatment Adherence	Consistency with prescribed meds, dressings, activity restrictions, and diet.
	6.2 Follow-up Appointment Adherence	Ability and willingness to attend scheduled clinic visits, imaging, or lab draws.
	6.3 Lifestyle & Environmental Factors	Housing stability, social support, and competing obligations that influence adherence and healing.

The ChatGPT-o3 synthesis of the 4 codebooks took approximately 0.25 person-hours to complete and resulted in 8 themes and 42 subthemes, shown in Table 6.

Table 6.

Synthesized Codebook

Theme	Subtheme	Definition
1. Communication & Health Literacy	1.1 Language & Literacy Barriers	Limited English proficiency, low reading level, or cultural nuance that blocks comprehension of health information.
	1.2 Miscommunication	Asynchronous, unclear, or contradictory messages that create confusion or errors.
	1.3 Effective Communication	Timely, accurate, empathic exchanges (verbal, written, or digital) that build understanding and trust.
	1.4 Interpreter & Tech Tools	Professional interpreters, translation software, portals, or telehealth platforms used to bridge gaps.
	1.5 Follow-up Channels	Phone triage, MyChart, secure text, and other pathways for post-discharge questions, with clear response expectations.
2. Patient Education & Self-Management	2.1 Pre-operative Education & Expectations	Information before surgery about risks, procedures, ostomies, pain control, recovery timeline, to reduce anxiety and set goals.
	2.2 Discharge Instructions & Readiness	Clarity, completeness, and patient confidence in performing post-discharge tasks at the moment of hospital release.
	2.3 Post-surgery Self-management Education	Teaching on meds, wound/ostomy care, equipment use, diet/activity progression, and daily routines at home.
	2.4 Teach-back & Reinforcement	Return demonstration, phone calls, or other methods confirming understanding and refreshing key instructions.
	2.5 Education Volume & Timing	Staging material to avoid overwhelming patients or caregivers; balancing breadth of topics with cognitive load.
	2.6 Red-flag Symptom Recognition	Explicit teaching of warning signs (fever, bleeding, output changes) and when/how to seek help.
3. Care Coordination & Continuity	3.1 Case Management & Navigation	Personnel who arrange services, supplies, insurance approvals, and appointments across settings.
	3.2 Multidisciplinary Team Alignment	Consistency of plans/messages across surgeons, nurses, PT/OT, nutrition, and social work.
	3.3 Discharge Planning & Handoffs	Formal processes and documentation that transfer care safely from hospital to home/rehab.
	3.4 Cross-setting Continuity	Quality of information flow between hospital, PCP, home-health, or transitional care units.
	3.5 Role Clarity for Patients	Patient understanding of “who does what,” contact points, and escalation pathways.
	3.6 Follow-up Scheduling & Bundling	Timely, coordinated in-person or telehealth appointments, especially bundling multi-specialty visits for remote patients.
4. Support Systems & Psychosocial Resources	4.1 Family & Informal Caregiver Involvement	Presence, preparedness, and burden of relatives/friends who assist with decision-making and daily care.
	4.2 Professional Home-health & Community Services	Availability, timeliness, and adequacy of visiting nurses, therapists, or community programs.
	4.3 Emotional & Psychological Support	Access to chaplaincy, counseling, peer groups, or mental-health professionals for anxiety, depression, or body-image concerns.
	4.4 Social Work & Community Referrals	Activation or absence of social work and community resources for housing, food security, legal, or financial needs.
	4.5 Spiritual & Cultural Care	Services or rituals that respect cultural beliefs and promote acceptance, coping, and healing.
5. Access, Financial & Logistical Constraints	5.1 Insurance & Financial Barriers	Coverage gaps, co-pays, prior-authorizations, and out-of-pocket costs that limit meds, supplies, or services.
	5.2 Geographic & Transportation Barriers	Distance, travel costs, rural/remote location, or lack of transport impeding service access.
	5.3 Resource/Supply Shortages	Institutional or community deficits in staff, beds, or equipment that delay or restrict care.
	5.4 Post-acute Placement Availability	Difficulty securing skilled-nursing, rehab, or home-health placements due to capacity, eligibility, or funding.
6. Physical Recovery	6.1 Pain Management	Adequacy of analgesia plans, prescription fulfilment, non-pharmacologic strategies, and taper guidance.
	6.2 Symptom Management & Escalation	Patient ability and pathways to address nausea, wound changes, GI symptoms, or fever promptly.
	6.3 Early Mobility & Ambulation	Frequency and support for walking/physical therapy in hospital and at home; staffing or pain barriers.
	6.4 Nutrition & Diet Progression	Guidance on advancing diet, meeting protein/calorie goals, and preventing ileus or malnutrition.
	6.5 Sleep & Fatigue Management	Strategies to cluster care, promote rest, and mitigate the impact of sleep loss on healing.
	6.6 Wound Healing & Scar Management	Education and monitoring of incision/wound sites, scar care, and expectations for healing timeline.
7. Patient Engagement & Adherence	7.1 Medication & Treatment Adherence	Consistency in taking meds, changing dressings, performing ostomy care, and following restrictions.
	7.2 Appointment & Follow-up Adherence	Ability and willingness to attend scheduled visits, imaging, labs, or telehealth sessions.
	7.3 Lifestyle & Environmental Factors	Housing stability, work demands, family obligations, or living conditions that affect recovery behaviors.
	7.4 Positive Engagement & Self-efficacy	Patient confidence, motivation, and proactive behaviors that facilitate smoother recovery.
	7.5 Monitoring & Symptom Tracking	Diaries, mobile apps, or remote monitoring used to capture outputs, pain scores, or wound photos for early intervention.
8. Equipment & Supplies	8.1 Ostomy Supplies & Ordering	Starter kits, enrollment with suppliers, reorder instructions, and troubleshooting common issues.
	8.2 Wound Care Materials & VACs	Availability, insurance approval, and competence in negative-pressure therapy or dressing changes.
	8.3 Specialized Devices (TPN, Tube Feeds)	Training, safety checks, and logistical arrangements for infusion pumps, feeding tubes, or parenteral nutrition.
	8.4 Supply Delivery & Logistics	Timing, accuracy, and reliability of deliveries; handling rural addresses, holiday/weekend gaps, or damaged shipments.
	8.5 Financial Coping for Supplies	Strategies patients use (coupon programs, reuse, rationing) when coverage limits quantity or type of supplies.

3.1. Thematic Coverage of Codebooks

Tables 7 and 8 illustrates the thematic coverage of each of the four analytic workflows against the synthesized codebook, comprising 8 themes and 42 subthemes. The number in parentheses next to each workflow indicates the total count of subthemes captured by that specific workflow. The ChatGPT-o3-Notes workflow (W2) demonstrated the highest thematic coverage (37/42 subthemes), followed by ChatGPT-o3 transcripts (W4; 34/42 subthemes), Copilot transcripts (W3; 20/42 subthemes), and Manual RQA (W1; 11/42 subthemes). It is notable that “Positive engagement & self-efficacy” (7.4) was an emergent subtheme introduced by ChatGPT-o3 during the synthesis process, which was not explicitly present in any of the initial four codebooks. ChatGPT-o3 explained that it generated this subtheme to capture implicit facilitator ideas related to patient confidence and proactive actions, thereby ensuring the harmonized codebook could represent both barriers and successes.

Table 7.

Percentage of Subthemes Identified by Each Workflow, Stratified by Main Theme

Theme	W1: Manual RQA	W2: ChatGPT-o3-Notes	W3: Copilot GPT-4-Transcripts	W4: ChatGPT-o3-Transcripts
Communication & Health Literacy	80.0%	80.0%	60.0%	80.0%
Patient Education & Self-Management	16.7%	100.0%	33.3%	66.7%
Care Coordination & Continuity	16.7%	100.0%	66.7%	66.7%
Support Systems & Psychosocial Resources	20.0%	100.0%	60.0%	60.0%
Access, Financial & Logistical Constraints	75.0%	100.0%	75.0%	100.0%
Physical Recovery	0.0%	100.0%	16.7%	100.0%
Patient Engagement & Adherence	0.0%	20.0%	40.0%	80.0%
Equipment & Supplies	20.0%	100.0%	40.0%	100.0%

Table 8.

Detailed Thematic Coverage of Codebooks

Harmonized codebook	Manual RQA (11/42)	ChatGPT-o3-Notes (37/42)	Copilot GPT-4-Transcripts (20/42)	ChatGPT-o3-Transcripts (34/42)
1 Communication & Health Literacy
1.1 Language & Literacy Barriers	✓	✓	✓	✓
1.2 Miscommunication	✓	✓	✓	✓
1.3 Effective Communication	✓	—	—	—
1.4 Interpreter & Tech Tools	✓	✓	—	✓
1.5 Follow-up Channels	—	✓	✓	✓
2 Patient Education & Self-Management
2.1 Pre-op Education & Expectations	—	✓	—	✓
2.2 Discharge Instructions & Readiness	—	✓	✓	✓
2.3 Post-surgery Self-management Education	—	✓	—	✓
2.4 Teach-back & Reinforcement	—	✓	—	—
2.5 Education Volume & Timing	✓	✓	—	—
2.6 Red-flag Symptom Recognition	—	✓	✓	✓
3 Care Coordination & Continuity
3.1 Case Management & Navigation	✓	✓	✓	✓
3.2 Multidisciplinary Alignment	—	✓	✓	✓
3.3 Discharge Planning & Handoffs	—	✓	✓	✓
3.4 Cross-setting Continuity	—	✓	✓	✓
3.5 Role Clarity for Patients	—	✓	—	—
3.6 Follow-up Scheduling & Bundling	—	✓	—	—
4 Support Systems & Psychosocial Resources
4.1 Family & Informal Caregiver Involvement	—	✓	✓	✓
4.2 Professional Home-health/Community	✓	✓	✓	✓
4.3 Emotional & Psychological Support	—	✓	✓	✓
4.4 Social Work & Community Referrals	—	✓	—	—
4.5 Spiritual & Cultural Care	—	✓	—	—
5 Access, Financial & Logistical Constraints
5.1 Insurance & Financial Barriers	✓	✓	✓	✓
5.2 Geographic & Transportation Barriers	✓	✓	✓	✓
5.3 Resource/Supply Shortages	✓	✓	✓	✓
5.4 Post-acute Placement Availability	—	✓	—	✓
6 Physical Recovery
6.1 Pain Management	—	✓	—	✓
6.2 Symptom Management & Escalation	—	✓	✓	✓
6.3 Early Mobility & Ambulation	—	✓	—	✓
6.4 Nutrition & Diet Progression	—	✓	—	✓
6.5 Sleep & Fatigue Management	—	✓	—	✓
6.6 Wound Healing & Scar Management	—	✓	—	✓
7 Patient Engagement & Adherence
7.1 Medication & Treatment Adherence	—	—	✓	✓
7.2 Appointment & Follow-up Adherence	—	—	—	✓
7.3 Lifestyle & Environmental Factors	—	—	✓	✓
7.4 Positive Engagement & Self-efficacy	—	—	—	—
7.5 Monitoring & Symptom Tracking	—	✓	—	✓
8 Equipment & Supplies
8.1 Ostomy Supplies & Ordering	—	✓	✓	✓
8.2 Wound Care Materials & VACs	—	✓	—	✓
8.3 Specialized Devices (TPN, Feeds)	—	✓	—	✓
8.4 Supply Delivery & Logistics	✓	✓	✓	✓
8.5 Financial Coping for Supplies	—	✓	—	✓

3.2. Nurse Participant Feedback on the Codebooks

Six of the 16 participants (37.5%) responded to the REDCap survey (Harris et al., 2009). Figure 2 presents participants’ rankings of the codebooks, with ChatGPT-o3-Notes (W2) being rated as the best match, followed by ChatGPT-o3 transcripts (W4), Copilot transcripts (W3), and Manual RQA (W1). Participants’ qualitative feedback on their top-ranked choices consistently highlighted the comprehensive and actionable nature of the AI-generated codebooks. Comments included: “Includes the steps that would improve our weaknesses,” “...addressing all potential barriers and what can be done to alleviate those barriers,” “The teaching and getting the patient prepared is what we strive for… ensuring that caregivers and family [are] available to learn in case the patient is unable,” “Better highlights communication barriers [and] offers more concrete details on barriers,” “Very thorough, accurately depicts [the] full picture of patient interventions for surgery and recovery,” and “It feels like it fits my role the most.”

Figure 2.

Nurse rankings of codebooks

4. Discussion

This study rigorously compared the efficiency and thematic completeness of four distinct qualitative analysis workflows applied to a common corpus of 16 nurse interviews and mini-focus groups: a manual rapid qualitative analysis workflow and three LLM–assisted workflows. Our findings demonstrate substantial time savings with LLM integration while achieving high thematic coverage, corroborated by participant feedback.

Specifically, manual RQA (W1) required approximately 30 person-hours. In contrast, the LLM-assisted workflows significantly reduced labor: ChatGPT-o3 on researcher notes (W2) required ∼8 person-hours, and both Copilot on full transcripts (W3) and ChatGPT-o3 on full transcripts (W4) each took ∼4 hours. The higher time for transcript-based LLM analysis was primarily due to institutional technical limitations (file-upload and prompt-length limits) rather than methodological constraints. In an unrestricted environment, these workflows would likely take less than 30 minutes, highlighting an infrastructural rather than a methodological bottleneck.

Beyond efficiency, the workflows produced codebooks of varied granularity and completeness. While W1 resulted in 3 themes and 10 subthemes, W2 yielded 10 themes and 47 subthemes. Workflows W3 and W4 provided an intermediate level of detail. Crucially, the two transcript-based LLM workflows (W3 and W4) captured nearly all concepts present in our 8-theme, 42-subtheme harmonized framework, yet required approximately 85% less time than manual RQA (W1), even with the current institutional inefficiencies. This demonstrates that LLMs can not only accelerate analysis but also achieve comprehensive thematic coverage when prompted to generate codes. Nurse participant preferences (W2, W4, W3, W1) echoed the order shown in the thematic completeness analysis, with all LLM workflows preferred by nurse participants to the manual RQA.

The divergence in thematic completeness between the models may be attributable to specific architectural differences. ChatGPT-o3 utilizes advanced ‘chain-of-thought’ reasoning and operates with a significantly larger context window compared to the GPT-4 architecture powering the Copilot workflow. These capabilities allow for more robust data synthesis, enabling the model to retain and connect disparate narrative threads across lengthy transcripts, a critical advantage when analyzing the dense, unstructured accounts typical of qualitative interviews.

These results echo and extend a growing empirical literature on LLM-assisted qualitative analysis. One study demonstrated that ChatGPT generated complete inductive codebooks for health-services interviews in under 15 minutes, achieving high macro-level concordance with human gold standards (Bijker et al., 2024). Another reported a 70% reduction in analyst time and strong inter-rater agreement (Cohen’s κ between 0.72 and 0.95) when GPT-4 drafted codes (Liu et al., 2025). Our work corroborates these findings within a nursing context, showing that when LLMs are tasked with generating, not just assigning, codes, near-complete thematic coverage is achievable.

However, none of the workflows were without limitations. LLMs occasionally exhibited “hallucinations,” such as generating duplicate categories or imposing a fixed number of subthemes (e.g., four under every parent theme) even when the content did not warrant it. For instance, Copilot GPT-4 notably missed several details related to “Equipment and Supplies” that ChatGPT-o3 captured. These observations reinforce prior cautions about LLM “creativity” and underscore the critical need for robust human oversight and quality assurance at every stage of LLM-assisted qualitative analysis (Jiang et al., 2024).

Participant feedback on the codebooks provided further insights into what constitutes a “good” codebook from an end-user perspective. Of the 16 nurse participants invited, 6 (37.5%) completed the evaluation survey. While this sample size limits statistical generalizability and introduces potential response bias (where respondents may be those most engaged with the topic), the qualitative feedback provided offers critical preliminary validity checks from a content-expert perspective. Key takeaways from nurse participant preferences for the top-ranked codebooks (primarily LLM-generated) included:

Thoroughness: Participants consistently favored codebooks that provided a comprehensive overview, capturing the full spectrum of barriers patients face and the interventions that could follow.

Action-orientation: Codebooks that presented clear, actionable steps or solutions to identified weaknesses were highly valued.

Emphasis on Communication Barriers: Explicit highlighting of communication gaps among clinicians, patients, and caregivers was viewed favorably, reflecting a critical area in nursing practice.

Focus on Preparation & Teaching: Content emphasizing patient and family education and preparation for surgery and recovery resonated strongly, aligning with nurses’ daily priorities.

Role Fit: Several respondents noted that their preferred option “fits my role the most,” suggesting that the perceived applicability of themes in clinical practice significantly influenced their judgment of quality.

These participant preferences provide a framework for systematically characterizing the limitations of the less-performing AI workflows. By inverting the attributes most valued by the nurses, specifically thoroughness, action-orientation, and role fit, we can categorize the primary limitations of the LLM outputs as follows:

1. Lack of Granularity: Nurses consistently favored the comprehensive overview provided by ChatGPT-o3 (W4). This highlights a critical limitation in the GPT-4 Copilot workflow (W3), which, despite being efficient, produced thinner thematic descriptions that failed to capture the full spectrum of barriers identified in the manual analysis. Methodologically, this suggests that without advanced reasoning architectures, standard LLM summarization prioritizes brevity over the rich description necessary for rigorous qualitative inquiry.

2. Alignment with Purpose: The strong preference for actionable steps suggests that, without specific prompting to identify solutions, models may generate technically accurate but clinically passive themes that fail to resonate with nurses’ daily priorities. In implementation science contexts, this represents a significant risk, as the utility of qualitative evidence often lies in the operational details that models may smooth over.

3. Contextual Genericness: The feedback that the top choice fits their role the most implies that the lower-ranked outputs suffered from a lack of domain-specific resonance. In the context of EAS recovery, an AI output that misses the specific vocabulary of ‘patient education’ or ‘caregiver communication’ is not merely less detailed, it is perceived as having lower validity by the content experts. This underscores a broader transferability challenge: some AI models may default to normative language, struggling to capture the distinct professional dialects and tacit knowledge that define expert communities.

Regarding interpretive depth, we acknowledge that current LLMs prioritize breadth and speed (semantic analysis) over the nuanced, latent interpretation that sustained human immersion in the data yields. However, when the goal is rapid implementation rather than theory generation, this trade-off is acceptable. The AI successfully identifies the what (barriers) even if it occasionally misses the deeper why (latent meaning), satisfying the pragmatic requirements of the study.

Based on our findings, particularly the balance of efficiency and thematic completeness demonstrated by the LLM-assisted workflows and the positive participant feedback, we recommend a human-in-the-loop protocol for projects requiring rapid yet thorough qualitative insights, such as those in early design cycles or iterative implementation science:

1. Compile researcher notes: Team members review audio recordings by listening in full and capturing detailed notes.

2. Generate two parallel codebooks: Utilize a reflective LLM (e.g., ChatGPT-o3) to create one codebook from researcher notes and another from cleaned and deidentified raw transcripts. Our results indicate these two approaches offered the most comprehensive thematic coverage and were highly regarded by nurse participants.

3. LLM-assisted reconciliation: Prompt an LLM to reconcile and merge these parallel codebooks into a single, synthesized version.

4. Human consensus meeting: Conduct a dedicated team meeting to critically review the merged codebook, ensuring conceptual clarity, accuracy, and addressing any LLM-generated errors or omissions.

5. Member-checking: When feasible, member-check the final codebook with study participants to validate its resonance and applicability from their lived experience.

This proposed protocol preserves the crucial cognitive benefits of team members’ immersion in the data (taking notes) while strategically outsourcing the most time-consuming mechanical steps to LLMs. These findings are particularly beneficial for implementation scientists who often require faster data translation within resource-constrained environments.

However, ethical considerations in using LLMs for qualitative analysis remain paramount. Researchers must prioritize alignment with institutional policies regarding the protection of participant data. Awareness of these privacy protections is crucial, particularly for compliance with privacy laws and/or policies. While these workflows are recommended for rapid data translation with large datasets, especially within implementation science, LLM outputs must always be cross-referenced against the original data to maintain analytic rigor and mitigate the risk of hallucinations. The integration of LLMs into qualitative workflows necessitates a re-evaluation of data governance and interpretive responsibility. While LLMs can efficiently synthesize patterns, they lack the agency to assume accountability for clinical insights. We posit that the ‘Human-in-the-Loop’ safeguards utilized in this study are not merely quality assurance mechanisms but fundamental ethical requirements. When delegating thematic synthesis to AI, the final accountability for the validity of findings must remain solely with the human research team. Researchers must actively audit AI outputs to ensure that efficiency gains do not come at the cost of erasing minority perspectives or propagating hallucinated clinical associations that could impact patient safety.

This workflow necessitates a re-conceptualization of reflexivity. In AI-assisted qualitative research, the researcher’s role shifts from primary analyst to expert auditor. Bias does not disappear; rather, it shifts upstream to the prompt engineering phase and downstream to the validation phase. Reflexivity therefore requires the researcher to critically examine not just their interpretation of the data, but how their prompts may have constrained or directed the model’s outputs.

Additionally, we acknowledge that the specific models evaluated (ChatGPT-o3 and GPT-4) are already becoming obsolete relative to emerging architectures. However, the methodological implication holds: as models evolve to generate richer outputs, the risk of plausible-sounding hallucinations may increase. Future qualitative methodology must therefore shift focus from mere coding efficiency to rigorous interpretive auditing, ensuring that as AI tools become more capable, human oversight evolves to detect increasingly sophisticated errors.

Finally, it is important to acknowledge that numerous forms of qualitative analysis and theoretical approaches are better suited for human-only analysis, particularly when the researcher’s embodied role and deep contextual understanding are central to the inquiry.

5. Conclusion

This study provides compelling evidence that integrating LLMs into qualitative data analysis significantly reduces labor while achieving comprehensive thematic coverage of nurse-identified recovery barriers. Specifically, LLM assistance captured virtually all concepts present in a synthesized codebook, far surpassing the thematic breadth of manual RQA in a fraction of the time. Yet, our findings also underscore that human reconciliation remains essential to validate LLM outputs, correct duplications and omissions, and ultimately uphold qualitative rigor. We propose a pragmatic hybrid approach, involving dual LLM analyses of human notes and transcripts, a human review session and, ideally, member-checking, as an optimal strategy. Beyond the specific context of EAS, this study highlights the transferability of AI-assisted RQA to broader qualitative inquiry. The structural components of our hybrid workflow, including the defined prompt architecture, the use of summary notes for velocity, and the ‘human-in-the-loop’ validation protocols are domain-independent and readily adaptable to other applied fields such as public health, education, or organizational science. However, the interpretive accuracy remains domain-dependent. As evidenced by the nuanced differences in model performance, the effectiveness of the human check relies on content expertise. Researchers applying this method elsewhere must therefore ensure that the human auditor possesses deep subject-matter knowledge to detect the specific, plausible-sounding hallucinations that generic models may generate within specialized domains. This protocol offers a powerful harmonization of analytical speed, ethical LLM utilization, and the generation of translatable insights crucial for advancing implementation science.

Footnotes

ORCID iDs

Jenna L. Marquard

Christie L. Martin

Amanda C. Trofholz

Elizabeth C. Wick

Genevieve B. Melton

Ethical Considerations

This study was approved by the University of Minnesota and University of California, San Francisco Institutional Review Boards.

Author Contributions

Conception and Design (JM, LW, GM), Data Collection (JM, AT), Data Analysis (JM, CM, ST, AF), Drafting Manuscript (JM, CM, ST, AF), Critical Revision of the Manuscript (JM, CM, ST, AF, AT, EW, GM)

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by an award from the Agency for Healthcare Quality and Research (AHRQ) R18 HS029616 and by the University of Minnesota Center for Learning Health System Sciences (CLHSS), a collaboration between the Medical School and School of Public Health. The content is solely the responsibility of the authors and does not represent the official views of the AHRQ or CLHSS.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Deidentified study data will be made available upon reasonable request.*

References

Aggarwal

Peden

C. J.

Mohammed

M. A.

Pullyblank

Williams

Stephens

Kellett

Kirkby-Bott

Quiney

for the Emergency Laparotomy Collaborative . (2019). Evaluation of the Collaborative Use of an Evidence-Based Care Bundle in Emergency Laparotomy. JAMA Surgery, 154(5), Article e190145. https://doi.org/10.1001/jamasurg.2019.0145

Bano

Zowghi

Whittle

(2024). AI and Human Reasoning: Qualitative Research in the Age of Large Language Models. AI Ethics Journal, 4(2), 1. https://doi.org/10.47289/AIEJ20240122

Bergman

A. J.

McNabb

K. C.

Relf

M. V.

Dredze

M. H.

(2024). “Where No One Has Gone Before”: Questions to Ensure the Ethical, Rigorous, and Thoughtful Application of Artificial Intelligence in the Analysis of HIV Research. Journal of the Association of Nurses in AIDS Care, 35(5), 450–455. https://doi.org/10.1097/JNC.0000000000000483

Bijker

Merkouris

S. S.

Dowling

N. A.

Rodda

S. N.

(2024). ChatGPT for Automated Qualitative Research: Content Analysis. Journal of Medical Internet Research, 26, e59050. https://doi.org/10.2196/59050

Brondani

Alves

Ribeiro

Braga

M. M.

Garcia

R. C. M.

Ardenghi

Pattanaporn

(2024). Artificial intelligence, ChatGPT, and dental education: Implications for reflective assignments and qualitative research. Journal of Dental Education, 88(12), 1671–1680. https://doi.org/10.1002/jdd.13663

Brown

Xhaja

(2018). Nursing Perspectives on Enhanced Recovery After Surgery. Surgical Clinics of North America, 98(6), 1211–1221. https://doi.org/10.1016/j.suc.2018.07.008

Brown

Cigarroa Kennedy

Carranco Chávez

Dumeng-Rodriguez

Cullen

(2025). Evaluation of a notes-based rapid qualitative analysis method to facilitate implementation. Implementation Science Communications, 6(1), 23. https://doi.org/10.1186/s43058-025-00709-w

Castellanos

Jiang

Gomes

Vander Meer

Castillo

(2025). Large Language Models for Thematic Summarization in Qualitative Health Care Research: Comparative Analysis of Model and Human Performance. JMIR AI, 4, Article e64447. https://doi.org/10.2196/64447

Chubb

L. A.

(2023). Me and the Machines: Possibilities and Pitfalls of Using Artificial Intelligence for Qualitative Data Analysis. International Journal of Qualitative Methods, 22, Article 16094069231193593. https://doi.org/10.1177/16094069231193593

10.

De Paoli

(2024). Performing an Inductive Thematic Analysis of Semi-Structured Interviews With a Large Language Model: An Exploration and Provocation on the Limits of the Approach. Social Science Computer Review, 42(4), 997–1019. https://doi.org/10.1177/08944393231220483

11.

Feuston

J. L.

Brubaker

J. R.

(2021). Putting Tools in Their Place: The Role of Time and Perspective in Human-AI Collaboration for Qualitative Analysis. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 1–25. https://doi.org/10.1145/3479856

12.

Gale

N. K.

Heath

Cameron

Rashid

Redwood

(2013). Using the framework method for the analysis of qualitative data in multi-disciplinary health research. BMC medical research methodology, 13, 117. https://doi.org/10.1186/1471-2288-13-117

13.

Guest

MacQueen

K. M.

Namey

E. E.

(2012). Applied thematic analysis. Sage Publications, Inc. https://doi.org/10.4135/9781483384436

14.

Harris

P. A.

Taylor

Thielke

Payne

Gonzalez

Conde

J. G.

(2009). Research electronic data capture (REDCap) – A metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform, 42(2), 377–381. https://doi.org/10.1016/j.jbi.2008.08.010

15.

Hitch

(2024). Artificial Intelligence Augmented Qualitative Analysis: The Way of the Future? Qualitative Health Research, 34(7), 595–606. https://doi.org/10.1177/10497323231217392

16.

Jiang

Tian

Hua

Wang

Guo

(2024). A Survey on Large Language Model Hallucination via a Creativity Perspective (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2402.06647

17.

Khanderia

Aggarwal

Bouras

Patel

(2024). Quality of life after emergency laparotomy: A systematic review. BMC Surgery, 24(1), 73. https://doi.org/10.1186/s12893-024-02337-y

18.

Kondo

Miyachi

Jönsson

Nishigori

(2024). A mixed-methods study comparing human-led and ChatGPT-driven qualitative analysis in medical education research (No. 4). Nagoya University Graduate School of Medicine, School of Medicine. https://doi.org/10.18999/nagjms.86.4.620

19.

Lau

J. W. L.

Baliga

Khan

Teo

Y. X.

Yeo

J. M. J.

Yeow

V. Z.

C. X.

Teo

Goh

T. J. H.

Iau

Emergency Laparotomy Group . (2024). Perioperative emergency laparotomy pathway for patients undergoing emergency laparotomy: A propensity score matched study. Annals of the Academy of Medicine, Singapore, 53(12), 713–723. https://doi.org/10.47102/annals-acadmedsg.2024311

20.

Liu

Zambrano

A. F.

Baker

R. S.

Barany

Ocumpaugh

Zhang

Pankiewicz

Nasiar

Wei

(2025). Qualitative Coding with GPT-4: Where it Works Better. Journal of Learning Analytics, 12(1), 169–185. https://doi.org/10.18608/jla.2025.8575

21.

Marshall

D. T.

Naff

D. B.

(2024). The Ethics of Using Artificial Intelligence in Qualitative Research. Journal of Empirical Research on Human Research Ethics, 19(3), 92–102. https://doi.org/10.1177/15562646241262659

22.

McLean

K. A.

Knight

S. R.

Diehl

T. M.

Varghese

Potter

M. A.

Zafar

S. N.

Bouamrane

M.-M.

Harrison

E. M.

(2023). Readiness for implementation of novel digital health interventions for postoperative monitoring: A systematic review and clinical innovation network analysis. The Lancet Digital Health, 5(5), e295–e315. https://doi.org/10.1016/S2589-7500(23)00026-2

23.

Morgan

D. L.

(2023). Exploring the Use of Artificial Intelligence for Qualitative Data Analysis: The Case of ChatGPT. International Journal of Qualitative Methods, 22, Article 16094069231211248. https://doi.org/10.1177/16094069231211248

24.

Morse

Albert

Harpel

Rameau

(2025). Large Language Model Versus Human‐Generated Thematic Analysis in Otolaryngology Qualitative Research. The Laryngoscope, 135(1), 1–2. https://doi.org/10.1002/lary.31894

25.

Palinkas

L. A.

Zatzick

(2019). Rapid Assessment Procedure Informed Clinical Ethnography (RAPICE) in Pragmatic Clinical Trials of Mental Health Services Implementation: Methods and Applied Case Study. Administration and Policy in Mental Health and Mental Health Services Research, 46(2), 255–270. https://doi.org/10.1007/s10488-018-0909-3

26.

Perkins

Roe

(2024). The use of Generative AI in qualitative analysis: Inductive thematic analysis with ChatGPT. Journal of Applied Learning & Teaching, 7(1), 390. https://doi.org/10.37074/jalt.2024.7.1.22

27.

Riley

W. T.

Glasgow

R. E.

Etheredge

Abernethy

A. P.

(2013). Rapid, responsive, relevant (R3) research: a call for a rapid learning health research enterprise. Clinical and translational medicine, 2(1), 10. https://doi.org/10.1186/2001-1326-2-10

28.

Rodrigues Dos Anjos

De Souza

M. G.

Serrano De Andrade Neto

Campello De Souza

(2024). An analysis of the generative AI use as analyst in qualitative research in science education. Revista Pesquisa Qualitativa, 12(30), 01–29. https://doi.org/10.33361/rpq.2024.v.12.n.30.724

29.

Sakaguchi

Sakama

Watari

(2025). Evaluating ChatGPT in Qualitative Thematic Analysis With Human Researchers in the Japanese Clinical Context and Its Cultural Interpretation Challenges: Comparative Qualitative Study. Journal of Medical Internet Research, 27, Article e71521. https://doi.org/10.2196/71521

30.

Tengberg

L. T.

Bay-Nielsen

Bisgaard

Cihoric

Lauritsen

M. L.

Foss

N. B.

Orbæk

Veyhe

Jørgen Nielsen

Lindgaard

AHA, study group (2017). Multidisciplinary perioperative protocol in patients undergoing acute high-risk abdominal surgery. British Journal of Surgery, 104(4), 463–471. https://doi.org/10.1002/bjs.10427

31.

Vindrola-Padros

Johnson

G. A.

(2020). Rapid Techniques in Qualitative Research: A Critical Review of the Literature. Qualitative health research, 30(10), 1596–1604. https://doi.org/10.1177/1049732320921835

32.

Wachinger

Bärnighausen

Schäfer

L. N.

Scott

McMahon

S. A.

(2025). Prompts, Pearls, Imperfections: Comparing ChatGPT and a Human Researcher in Qualitative Data Analysis. Qualitative Health Research, 35(9), 951–966. https://doi.org/10.1177/10497323241244669

33.

Xiao

Yuan

Liao

Q. V.

Abdelghani

Oudeyer

P.-Y.

(2023). Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding. In 28th International Conference on Intelligent User Interfaces (pp. 75–78). https://doi.org/10.1145/3581754.3584136

Manual vs. AI-Assisted Qualitative Analysis: A Methodological Comparison of Nurse Perspectives on Surgical Recovery