Sage Journals: Discover world-class research

Abstract

Background:

Patient–provider interactions could inform care quality and communication but are rarely leveraged because collecting and analyzing them is both time-consuming and methodologically complex. The growing availability of large language models (LLMs) makes these analyses more feasible, though their accuracy remains uncertain.

Objectives:

Assess an LLM’s ability to analyze patient–provider interactions.

Design:

Compare a human’s and an LLM’s codings of clinical encounter transcripts.

Setting/Subjects:

Two hundred and thirty-six potential symptom discussions from transcripts of clinical encounters with 92 patients living with cancer in the mid-Atlantic United States. Transcripts were analyzed by GPT4DFCI in our hospital’s Health Insurance Portability and Accountability Act compliant infrastructure instance of GPT-4 (OpenAI).

Measurements:

Human and an LLM-coded transcripts to determine whether a patient’s reported symptom(s) were discussed, who initiated the discussion, and any resulting recommendation. We calculated Cohen’s κ to assess interrater agreement between the LLM and human and qualitatively classified disagreements about recommendations.

Results:

Interrater reliability indicated “strong” and “moderate” agreement levels across measures: Agreement was strongest for whether the symptom was discussed (k = 0.89), followed by who initiated the discussion (k = 0.82), and the recommendation provided (k = 0.78). The human and LLM disagreed on the presence and/or content of the recommendation in 16% of potential discussions, which we categorized into nine types of disagreements.

Conclusions:

Our results suggest that LLMs’ abilities to analyze clinical encounters are equivalent to humans. Thus, using LLMs as a research tool may make it more feasible to analyze patient–provider interactions, which could have broader implications for assessing and improving care quality, care inequities, and provider communication.

Keywords

large language models patient–provider interactions symptom discussions

Get full access to this article

View all access options for this article.

References

Hagiwara

, Penner

, Gonzalez

, et al. Racial attitudes, physician-patient talk time ratio, and adherence in racially discordant medical interactions. Soc Sci Med, 2013; 87:123–131.

Beach

, Keruly

, Moore

. Is the quality of the patient-provider relationship associated with better adherence and health outcomes for patients with HIV? J Gen Intern Med, 2006; 21(6):661–665.

Fiscella

, Meldrum

, Franks

, et al. Patient trust: Is it related to patient-centered behavior of primary care physicians? Med Care, 2004; 42(11):1049–1055.

Blanchard

, Lurie

. R-E-S-P-E-C-T: Patient reports of disrespect in health care setting and its impact on care. J Fam Pract, 2004; 53(9).

Little

, Everitt

, Williamson

, et al. Observational study of effect of patient centredness and positive approach on outcomes of general practice consultations. Bmj, 2001; 323(7318):908–911.

Epstein

, Franks

, Shields

, et al. Patient-centered communication and diagnostic testing. Ann Fam Med, 2005; 3(5):415–421.

Levinson

, Roter

, Mullooly

, et al. Physician-patient communication: The relationship with malpractice claims among primary care physicians and surgeons. JAMA, 1997; 277(7):553–559.

Cooper

, Roter

, Carson

, et al. The associations of clinicians’ implicit attitudes about race with medical visit communication and patient ratings of interpersonal care. Am J Public Health, 2012; 102(5):979–987.

Tulsky

, Arnold

, Alexander

, et al. Enhancing communication between oncologists and patients with a computer-based training program. Ann Intern Med, 2011; 155(9):593–601; doi: 10.7326/0003-4819-155-9-201111010-00007

10.

Roter

, Larson

. The Roter interaction analysis system (RIAS): Utility and flexibility for analysis of medical interactions. Patient Educ Couns, 2002; 46(4):243–251; doi: 10.1016/S0738-3991(02)00012-5

11.

Wang

, Hassan

, LeBaron

, et al. CommSense: A wearable sensing computational framework for evaluating patient-clinician interactions. Proc ACM Hum-Comput Interact, 2024; 8(CSCW2):1–31; doi: 10.1145/3686952

12.

Williams

CYK

, Zack

, Miao

, et al. Use of a large language model to assess clinical acuity of adults in the emergency department. JAMA Netw Open, 2024; 7(5):e248895; doi: 10.1001/jamanetworkopen.2024.8895

13.

Cabral

, Restrepo

, Kanjee

, et al. Clinical reasoning of a generative artificial intelligence model compared with physicians. JAMA Intern Med, 2024; 184(5):581–583; doi: 10.1001/jamainternmed.2024.0295

14.

Guevara

, Chen

, Thomas

, et al. Large language models to identify social determinants of health in electronic health records. NPJ Digit Med, 2024; 7(1):6; doi: 10.1038/s41746-023-00970-0

15.

Zaretsky

, Kim

, Baskharoun

, et al. Generative artificial intelligence to transform inpatient discharge summaries to patient-friendly language and format. JAMA Netw Open, 2024; 7(3):e240357; doi: 10.1001/jamanetworkopen.2024.0357

16.

Steimetz

, Minkowitz

, Gabutan

, et al. Use of artificial intelligence chatbots in interpretation of pathology reports. JAMA Netw Open, 2024; 7(5):e2412767; doi: 10.1001/jamanetworkopen.2024.12767

17.

Porter

, Pollak

, Farrell

, et al. Development and implementation of an online program to improve how patients communicate emotional concerns to their oncology providers. Support Care Cancer, 2015; 23(10):2907–2916; doi: 10.1007/s00520-015-2656-2

18.

Umeton

, Kwok

, Maurya

, et al. GPT-4 in a Cancer Center—Institute-wide deployment challenges and lessons learned. NEJM AI, 2024; 1(4):AIcs2300191; doi: 10.1056/AIcs2300191

19.

McHugh

. Interrater reliability: The kappa statistic. Biochem Med (Zagreb), 2012; 22(3):276–282; doi: 10.11613/BM.2012.031

20.

Penalba

, Deshields

, Klinkenberg

. Gaps in communication between cancer patients and healthcare providers: Symptom distress and patients’ intentions to disclose. Support Care Cancer, 2019; 27(6):2039–2047; doi: 10.1007/s00520-018-4442-4

21.

Takeuchi

, Keding

, Awad

, et al. Impact of patient-reported outcomes in oncology: A longitudinal analysis of patient-physician communication. J Clin Oncol, 2011; 29(21):2910–2917; doi: 10.1200/JCO.2010.32.2453

22.

Coombs

, Neller

, Wilson

, et al. Treatment decision conversations, symptoms, and functional status in older adults with advanced cancer: An exploratory study utilizing mixed methods. J Geriatr Oncol, 2023; 14(2):101414; doi: 10.1016/j.jgo.2022.12.002

23.

Agaronnik

, Davis

, Manz

, et al. Large language models to identify advance care planning in patients with advanced cancer. J Pain Symptom Manage, 2025; 69(3):243–250.e1; doi: 10.1016/j.jpainsymman.2024.11.016

24.

Agaronnik

, Davis

, Manz

, et al. Feasibility study for using large language models to identify goals-of-care documentation at scale in patients with advanced cancer. JCO Oncol Pract, 2025:OP2400992; doi: 10.1200/OP-24-00992

25.

Sanders

, Curtis

, Tulsky

. Achieving goal-concordant care: A conceptual model and approach to measuring serious illness communication and its impact. J Palliat Med, 2018; 21(S2):S17–S27; doi: 10.1089/jpm.2017.0459

26.

Sanders

, Paladino

, Reaves

, et al. Quality measurement of serious illness communication: Recommendations for health systems based on findings from a symposium of national experts. J Palliat Med, 2020; 23(1):13–21; doi: 10.1089/jpm.2019.0335

27.

Myers

, Steinberg

, Incardona

, et al. Simplifying serious illness communication: Preparing or deciding. Curr Oncol, 2024; 31(10):5832–5837; doi: 10.3390/curroncol31100433

28.

Tulsky

, Steinhauser

, LeBlanc

, et al. Triadic agreement about advanced cancer treatment decisions: Perceptions among patients, families, and oncologists. Patient Educ Couns, 2022; 105(4):982–986; doi: 10.1016/j.pec.2021.08.001

29.

Shim

. Cultural health capital: A theoretical approach to understanding health care interactions and the dynamics of unequal treatment. J Health Soc Behav, 2010; 51(1):1–15.

30.

Broadbridge

, Greene

, Venetis

, et al. The influence of perceived provider empathic communication on disclosure decision-making. Health Commun, 2024; 39(9):1807–1824; doi: 10.1080/10410236.2023.2243409

31.

Xiao

, Polomano

, Bruner

. Comparison between patient-reported and clinician-observed symptoms in oncology. Cancer Nurs, 2013; 36(6):E1–E16; doi: 10.1097/NCC.0b013e318269040f

32.

Morden

, Chyn

, Wood

, et al. Racial inequality in prescription opioid receipt—Role of individual health systems. N Engl J Med, 2021; 385(4):342–351; doi: 10.1056/NEJMsa2034159

33.

Fenton

, Elliott

, Schwebel

, et al. Unequal interactions: Examining the role of patient-centered care in the inequitable diffusion of a medical innovation, the human papillomavirus (HPV) vaccine. Soc Sci Med, 2018; 200:238–248.

34.

Nocon

, Ajmani

, Bhayani

. A contemporary analysis of racial disparities in recommended and received treatment for head and neck cancer. Cancer, 2020; 126(2):381–389; doi: 10.1002/cncr.32342

35.

Dovidio

, Fiske

. Under the radar: How unexamined biases in decision-making processes in clinical interactions can contribute to health care disparities. Am J Public Health, 2012; 102(5):945–952; doi: 10.2105/AJPH.2011.300601.Under

36.

Lin

M-Y

, Kressin

. Race/ethnicity and Americans’ experiences with treatment decision making. Patient Educ Couns, 2015; 98(12):S0738–S1642; doi: 10.1016/j.pec.2015.07.017

37.

Lindvall

, Deng

C-Y

, Moseley

, et al. Natural language processing to identify advance care planning documentation in a multisite pragmatic clinical trial. J Pain Symptom Manage, 2022; 63(1):e29–e36; doi: 10.1016/j.jpainsymman.2021.06.025

38.

Timmermans

, Yang

, Gardner

, et al. Does patient-centered care change genital surgery decisions? The strategic use of clinical uncertainty in disorders of sex development clinics. J Health Soc Behav, 2018; 59(4):520–535; doi: 10.1177/0022146518802460

39.

Fenton

, Perkins

, Eun

, et al. Indicated or elective? The association of providers’ words with HPV vaccine receipt. Hum Vaccin Immunother, 2018; 14(10):2503–2509.

40.

Martin

, Roter

, Beach

, et al. Physician communication behaviors and trust among black and white patients with hypertension. Med Care, 2013; 51(2):151–157; doi: 10.1097/MLR.0b013e31827632a2

41.

Cascella

, Montomoli

, Bellini

, et al. Evaluating the feasibility of ChatGPT in healthcare: An analysis of multiple clinical and research scenarios. J Med Syst, 2023; 47(1):33; doi: 10.1007/s10916-023-01925-4

42.

Barr

, Haslett

, Dannenberg

, et al. An audio personal health library of clinic visit recordings for patients and their caregivers (HealthPAL): User-centered design approach. J Med Internet Res, 2021; 23(10):e25512; doi: 10.2196/25512

43.

Elwyn

, Engel

, Scalia

, et al. Individuals recording clinical encounters: A review of applicable law in multiple countries. Commun Med, 2024; 19(1):58–76.

44.

Acerbi

, Stubbersfield

. Large language models show human-like content biases in transmission chain experiments. Proc Natl Acad Sci U S A, n.d; 120(44):e2313790120; doi: 10.1073/pnas.2313790120

45.

. What is the impact of ChatGPT on education? A rapid review of the literature. Educ Sci, 2023; 13(4):410; doi: 10.3390/educsci13040410

46.

Sallam

. The utility of ChatGPT as an example of large language models in healthcare education, research and practice: Systematic review on the future perspectives and potential limitations. Medrxiv, 2023; 2023; doi: 10.1101/2023.02.19.2328615502.19.23286155

47.

Shen

, Heacock

, Elias

, et al. ChatGPT and other large language models are double-edged swords. Radiology, 2023; 307(2):e230163; doi: 10.1148/radiol.230163

48.

Gallegos

, Rossi

, Barrow

, et al. Bias and fairness in large language models: A survey. Comput Linguist, 2024; 50(3):1097–1179; doi: 10.1162/coli_a_00524

49.

Farquhar

, Kossen

, Kuhn

, et al. Detecting hallucinations in large language models using semantic entropy. Nature, 2024; 630(8017):625–630; doi: 10.1038/s41586-024-07421-0

50.

Barben

. Analyzing acceptance politics: Towards an epistemological shift in the public understanding of science and technology. Public Underst Sci, 2010; 19(3):274–292; doi: 10.1177/0963662509335459

51.

Schuetz

, Kuai

, Lacity

, et al. A qualitative systematic review of trust in technology. J Inform Technol, 2025; 40(1):55–76; doi: 10.1177/02683962241254392

52.

Rao

. ‘Tests Tell’: Constitutive legitimacy and consumer acceptance of the automobile: 1895–1912. In: The New Institutionalism in Strategic Management. Emerald Group Publishing Limited; 2000; pp. 307–335.

53.

McKnight

, Choudhury

, Kacmar

. Developing and validating trust measures for e-commerce: An integrative typology. Inf Syst Res, 2002; 13(3):334–359; doi: 10.1287/isre.13.3.334.81

Using Large Language Models to Analyze Symptom Discussions and Recommendations in Clinical Encounters

Abstract

Background:

Objectives:

Design:

Setting/Subjects:

Measurements:

Results:

Conclusions:

Keywords

Get full access to this article

References