Sage Journals: Discover world-class research

Abstract

Background:

Peripheral artery disease (PAD) is a global health challenge. Advances in artificial intelligence (AI), such as large language models (LLMs) and chain-of-thought (CoT) reasoning, offer novel approaches for clinical recommendations. This study compared the readability and guideline adherence of responses from physicians and AI for a standardized PAD case.

Methods:

This cross-sectional study gathered responses from 30 specialized physicians (11 cardiologists, 19 vascular surgeons) across seven Latin American countries and 13 LLM systems (10 standard, three CoT). Both groups addressed diagnosis, treatment, risks, and prognosis; LLMs responded as vascular specialists. Responses were blindly evaluated with five validated Spanish readability indices and compared to the 2024 ACC/AHA multisocietal PAD guideline. Three experts scored guideline adherence; nonparametric tests were applied.

Results:

Guideline adherence did not differ significantly between physicians (median 5.8 [3.4–7.6]) and LLMs (7.3 [4.7–9.7], p = 0.169), though CoT-LLMs achieved the highest scores (9.7 [8.5–11.0]). LLMs more often recommended supervised exercise (84.6% vs 30.0%, p = 0.002) and revascularization for quality of life (69.2% vs 20.0%, p = 0.004), whereas physicians favored cilostazol (60.0% vs 30.8%, p = 0.104). LLM responses had lower Readability μ values (46.9 vs 51.4, p = 0.012). Inter-rater reliability was highest for CoT-LLMs (intraclass correlation coefficient [ICC] = 0.98) versus physicians (ICC = 0.76).

Conclusion:

LLM showed comparable guideline adherence to physicians although CoT models achieved the highest scores. The difference in physician and AI treatment preferences suggest the potential of AI as adjunct clinical tools and warrants further study.

Keywords

chain-of-thought (CoT)large language models (LLM)peripheral artery disease (PAD)practice guidelines

Get full access to this article

View all access options for this article.

References

Song

Rudan

Zhu

, et al. Global, regional, and national prevalence and risk factors for peripheral artery disease in 2015: An updated systematic review and analysis. Lancet Glob Health 2019; 7: e1020–e1030.

Shu

Santulli

Update on peripheral artery disease: Epidemiology and evidence-based facts. Atherosclerosis 2018; 275: 379–381.

Vos

Lim

Abbafati

, et al. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: A systematic analysis for the Global Burden of Disease Study 2019. Lancet 2020; 396: 1204–1222.

Chen

Jacobsen

Deshmukh

Cantor

SB.

The evolution of the disability-adjusted life year (DALY). Socioecon Plann Sci 2015; 49: 10–15.

Roth

Mensah

Johnson

, et al. Global burden of cardiovascular diseases and risk factors, 1990–2019: Update from the GBD 2019 study. J Am Coll Cardiol 2020; 76: 2982–3021.

Bridgwood

Nickinson

Houghton

, et al. Knowledge of peripheral artery disease: What do the public, healthcare practitioners, and trainees know? Vasc Med 2020; 25: 263–273.

Iverson

Howard

Penney

BK.

Impact of internet use on health-related behaviors and the patient-physician relationship: A survey-based study and review. J Osteopath Med 2008; 108: 699–711.

Weiss

BD.

Health literacy and patient safety: Help patients understand. American Medical Association Foundation, 2007.

Ayers

Poliak

Dredze

, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 2023; 183: 589–596.

10.

Sarraju

Bruemmer

Van Iterson

, et al. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 2023; 329: 842–844.

11.

Feridooni

Cuen-Ojeda

, et al. Machine learning in vascular surgery: A systematic review and critical appraisal. NPJ Digit Med 2022; 5: 7.

12.

Javidan

Lee

, et al. A systematic review and bibliometric analysis of applications of artificial intelligence and machine learning in vascular surgery. Ann Vasc Surg 2022; 85: 395–405.

13.

Quer

Topol

EJ.

The potential for large language models to transform cardiovascular medicine. Lancet Digit Health 2024; 6: e767–e771.

14.

Menezes

MCS

Hoffmann

Tan

ALM

, et al. The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: A retrospective model-evaluation study. Lancet Digit Health 2025; 7: e35–e43.

15.

Ranji

SR.

Large language models—Misdiagnosing diagnostic excellence?

JAMA Netw Open 2024; 7: e2440901.

16.

Katz

Cohen

Shachar

, et al. GPT versus resident physicians—A benchmark based on official board scores. NEJM AI 2024; 1. DOI: 10.1056/AIdbp2300192.

17.

Fernandes

Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med 2023; 388: 2399–2400.

18.

Singhal

Azizi

, et al. Large language models encode clinical knowledge. Nature 2023; 620: 172–180.

19.

Raffort

Adam

Carrier

, et al. Fundamentals in artificial intelligence for vascular surgeons. Ann Vasc Surg 2020; 65: 254–260.

20.

Gómez-Gutiérrez

Verastegui

Gonzalez-Urquijo

, et al. A comparative analysis of the European and American venous disease guidelines: Bridging evidence to practice using AGREE II. Ann Vasc Surg 2026; 122: 34–42.

21.

Fernández Huerta

. [Simple measures of readability] [Article in Spanish]. Consigna 1959; 214: 29–32.

22.

Gutiérrez de Polini

. [Research on reading in Venezuela] [Article in Spanish]. Presentation at the First Primary Education Conference, Ministry of Education, Caracas, Venezuela, 1972.

23.

Crawford

[Formula and chart to determine the comprehensibility of primary-level texts in Spanish] [Article in Spanish]. Lect Vida 1985; 4: 18–24.

24.

Barrio-Cantalejo

Simón-Lorda

Melguizo

, et al. [Validation of the INFLESZ scale to evaluate readability of texts aimed at the patient] [Article in Spanish]. An Sist Sanit Navar 2008; 31: 135–152.

25.

Muñoz Baquedano

. Legibility and variability of texts [Article in Spanish]. Bol Investig Educ. 2006;21:13–25.

26.

Gornik

Aronow

Goodney

, et al. 2024 ACC/AHA/AACVPR/APMA/ABC/SCAI/SVM/SVN/SVS/SIR/VESS guideline for the management of lower extremity peripheral artery disease: A report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation 2024; 149: e1313–e1410.

27.

Eikelboom

Connolly

Bosch

, et al. Rivaroxaban with or without aspirin in stable cardiovascular disease. N Engl J Med 2017; 377: 1319–1330.

28.

Salam

Kravchenko

Nowak

, et al. Generative Pre-trained Transformer 4 makes cardiovascular magnetic resonance reports easy to understand. J Cardiovasc Magn Reson 2024; 26: 101035.

29.

Miao

Thongprayoon

Suppadungsuk

, et al. Chain of thought utilization in large language models and application in nephrology. Med Kaunas Lith 2024; 60: 148.

30.

Bedenis

Stewart

Cleanthis

, et al. Cilostazol for intermittent claudication. Cochrane Database Syst Rev 2014; 2014: CD003748.

31.

Chervonski

Harish

Rockman

, et al. Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients. Vascular 2025; 33: 229–237.

32.

Haidar

Jaques

McCaughran

, et al. AI-generated information for vascular patients: Assessing the standard of procedure-specific information provided by the ChatGPT AI-language model. Cureus 2023; 15: 349764.

33.

Ali

Connolly

Tang

, et al. Bridging the literacy gap for surgical consents: An AI-human expert collaborative approach. Npj Digit Med 2024; 7: 63.

34.

Lareyre

Nasr

Poggi

, et al. Large language models and artificial intelligence chatbots in vascular surgery. Semin Vasc Surg 2024; 37: 314–320.

35.

Moss

HE.

Deep learning to improve diagnosis must also not do harm. JAMA Ophthalmol 2024; 142: 1079–1080.

36.

Harari

Dias

Kennedy-Metz

, et al. Deep learning analysis of surgical video recordings to assess nontechnical skills. JAMA Netw Open 2024; 7: e2422520.

37.

Goh

Gallo

Hom

, et al. Large language model influence on diagnostic reasoning: A randomized clinical trial. JAMA Netw Open 2024; 7: e2440969.

38.

Zaretsky

Kim

Baskharoun

, et al. Generative artificial intelligence to transform inpatient discharge summaries to patient-friendly language and format. JAMA Netw Open 2024; 7: e240357.

39.

Behers

Stephenson-Moe

Gibons

, et al. Assessing the quality of patient education materials on cardiac catheterization from artificial intelligence chatbots: An observational cross-sectional study. Cureus 2024; 16: e69996.

40.

Hernández-Flores

López-Martínez

Rosales-de-la-Rosa

, et al. Assessment of challenging oncologic cases: A comparative analysis between ChatGPT, Gemini, and a multidisciplinary tumor board. J Surg Oncol 2025; 131: 1562–1570.

41.

Khera

Oikonomou

Nadkarni

, et al. Transforming cardiovascular care with artificial intelligence: From discovery to practice: JACC State-of-the-Art Review. J Am Coll Cardiol 2024; 84: 97–114.

42.

Rosenzveig

Tefera

. AI as a tool, not a replacement, in vascular medicine. Vasc Med 2026; 31: 79–80.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.31 MB

Comparing guideline adherence and readability: Artificial intelligence with deep learning versus specialized physicians in peripheral artery disease management