Artificial intelligence in the management of chronic pain and lipedema: A comparative analysis of ChatGPT-5o,Gemini-3,and perplexity AI in terms of readability and academic reliability

Abstract

Objectives

Lipedema is a chronic disorder characterized by pain and disproportionate fat distribution, and its diagnosis is frequently overlooked. The aim of this study was to evaluate and compare the responses generated by contemporary artificial intelligence models—ChatGPT-5o, Gemini-3, and Perplexity AI—to structured clinical questions developed in accordance with the 2024 S2k Lipedema Guideline. The models were analyzed in terms of clinical accuracy, readability, and reference reliability to assess their performance in delivering guideline-based medical information.

Methods

This cross-sectional and comparative study was conducted by submitting 30 structured clinical questions, prepared on the basis of the relevant guideline, to three large language models. Responses collected on 10 February 2026, were evaluated using a seven-point Likert scale (reliability) and a five-point scale (accuracy). Text readability was assessed using six established indices, including the Flesch Reading Ease Score (FRES), Flesch–Kincaid Grade Level (FKGL), and Gunning Fog Index (GFOG). Reference reliability was examined by analyzing hallucination tendencies as defined in the literature.

Results

A statistically significant difference in reliability was observed among the models (p = .041); Perplexity (4.95 ± 1.20) achieved significantly higher scores than ChatGPT-5o (4.38 ± 1.05) (p = .038). In readability analyses, Perplexity (12.80 ± 2.10) required a significantly higher educational level according to FKGL scores compared to both ChatGPT-5o (p = .041) and Gemini-3 (p = .036). Regarding reference reliability, ChatGPT-5o outperformed Perplexity in source verifiability (p = .031), bibliographic precision (p = .044), and total RHS scores (p = .027), emerging as the most robust model in this domain. No statistically significant differences were found among the models in terms of clinical accuracy and usefulness (p > .05). Inter-rater agreement was excellent (Kappa: 0.92–0.97).

Conclusion

In this study, ChatGPT-5o distinguished itself in reference quality, whereas Perplexity demonstrated superior reliability. However, the complex linguistic structures accompanying efforts to maintain high medical accuracy may constitute a significant barrier for individuals with limited e-health literacy. Although these systems show strong potential as medical information resources, they cannot yet replace expert physician oversight in terms of patient safety. A balanced approach between technical reliability and patient-centered simplification remains necessary.

Keywords

artificial intelligence ChatGPT lipedema online medical information patient education readability

Get full access to this article

View all access options for this article.

References

Czerwińska

Gruszecki

Rumiński

, et al. Examining the characteristic features of lipedema and the usefulness of BMI and WHtR in clinical evaluation. BMC Womens Health 2025; 25(1): 292. Published 2025 Jul 3. https://doi.org/10.1186/s12905-025-03834-9

Ozbek

Kuculmez

Dundar Ahi

. Prevalence of sarcopenia and its functional correlates in women with lower-extremity lipedema: a cross-sectional observational study. Phlebology: J Ven Dis. 2026; 2683555261451570. https://doi.org/10.1177/02683555261451570

Buso

Depairon

Tomson

, et al. Lipedema: a call to action. Obesity 2019; 27(10): 1567–1576. https://doi.org/10.1002/oby.22597

Wollina

. Lipedema-An update. Dermatol Ther 2019; 32(2): e12805. https://doi.org/10.1111/dth.12805

Forner-Cordero

Szolnoky

. Update in the management of lipedema. Int Angiol 2021; 40(4): 345–357. https://doi.org/10.23736/S0392-9590.21.04604-6

Herbst

Kahn

Iker

, et al. Standard of care for lipedema in the United States. Phlebology 2021; 36(10): 779–796. https://doi.org/10.1177/02683555211015887

Faerber

Cornely

Daubert

, et al. S2k guideline lipedema. J Dtsch Dermatol Ges 2024; 22(9): 1303–1315. https://doi.org/10.1111/ddg.15513

Kara

Ozduran

Kara

, et al. Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and perplexity on the most commonly asked questions about ankylosing spondylitis. PLoS One 2025; 20(6): e0326351. https://doi.org/10.1371/journal.pone.0326351

Bashah

Salem

Al-Waqeerah

, et al. Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer. BMC Oral Health 2025; 25(1): 1358. https://doi.org/10.1186/s12903-025-06726-4

10.

Aljamaan

Temsah

Altamimi

, et al. Reference hallucination score for medical artificial intelligence Chatbots: development and usability study. JMIR Med Inform 2024; 12: e54345. https://doi.org/10.2196/54345

11.

Özbek

İC

Bağcıer

. Reference hallucination in AI-Assisted academic writing: a comparative analysis of ChatGPT, Gemini, and perplexity in rotator cuff literature. JOIO. 2026. https://doi.org/10.1007/s43465-026-01807-0

12.

Özbek

İC

Hancı

Özduran

. Digital guidance: quality and readability analysis of artificial intelligence-generated spondyloarthropathy texts. Turk J Osteoporos 2025; 31(1): 12–18. https://doi.org/10.4274/tod.galenos.2024.76743

13.

Liu

Wright

Patterson

, et al. Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J Am Med Inform Assoc 2023; 30(7): 1237–1245. https://doi.org/10.1093/jamia/ocad072

14.

Ozduran

Hancı

Erkin

, et al. Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and perplexity regarding most frequently asked keywords about low back pain. PeerJ 2025; 13: e18847. https://doi.org/10.7717/peerj.18847

15.

Umay

. Dr ChatGPT: is it a reliable and useful source for common rheumatic diseases? Int J Rheum Dis 2023; 26(7): 1343–1349. https://doi.org/10.1111/1756-185X.14749

16.

Dabbas

Odeibat

Alhazaimeh

, et al. Accuracy of ChatGPT in neurolocalization. Cureus 2024; 16(4): e59143. https://doi.org/10.7759/cureus.59143

17.

Stormacq

Wosinski

Boillat

, et al. Effects of health literacy interventions on health-related outcomes in socioeconomically disadvantaged adults living in the community: a systematic review. JBI Evid Synth 2020; 18(7): 1389–1469. https://doi.org/10.11124/JBISRIR-D-18-00023

18.

Zaghloul

Fanous

Ahmed

, et al. Digital health literacy in patients with common chronic diseases: systematic review and meta-analysis. J Med Internet Res 2025; 25: e56231. https://doi.org/10.2196/56231

19.

Posso

Escobar-Domingo

Mustoe

, et al. Quality assessment of online health resources for lipedema: a multimetric analysis. Phlebology 2025; 41(5): 365–372. https://doi.org/10.1177/02683555251372218

20.

Esen

ÖE

Borman

Mete Civelek

, et al. YouTube as a source of information on lipedema: property, quality, and reliability assessment. Lymphatic Res Biol 2023; 21(4): 403–409. https://doi.org/10.1089/lrb.2022.0028

21.

Çiftkaya

PÖ

Bucak

ÖF

Ayan

, et al. Assessment of YouTube™ videos on lipoedema: quality, reliability, and educational gaps in a lymphatic disorder. Phlebology. 2026; 2683555261424065. https://doi.org/10.1177/02683555261424065

22.

Tran

BNN

Singh

Lee

, et al. Readability, complexity, and suitability analysis of online lymphedema resources. J Surg Res 2017; 213: 251–260. https://doi.org/10.1016/j.jss.2017.02.056

23.

Seth

Vargas

Chuang

, et al. Readability assessment of patient information about lymphedema and its treatment. Plast Reconstr Surg 2016; 137(2): 287e–295e. https://doi.org/10.1097/01.prs.0000475747.95096.ab

24.

Liao

Zhao

. A readability analysis of patient education materials about chronic venous disease provided by professional vascular societies. Phlebology 2023; 38(8): 556–560. https://doi.org/10.1177/02683555231190454

25.

Haidar

Jaques

McCaughran

, et al. AI-Generated information for vascular patients: assessing the standard of procedure-specific information provided by the ChatGPT AI-Language model. Cureus 2023; 15(11): e49764. https://doi.org/10.7759/cureus.49764

26.

Cetin

Demir

. Assessing the knowledge of ChatGPT and Google Gemini in answering peripheral artery disease-related questions. Vascular 2025; 33(6): 1282–1287. https://doi.org/10.1177/17085381251315999

27.

Yilmaz

Yeşilkaya

. Evaluating the reliability and guideline concordance of ChatGPT-5 in the management of vascular diseases: a cross-sectional expert-based assessment. J Cardiovasc Surg (Torino). 2026. https://doi.org/10.23736/S0021-9509.26.13536-8

28.

Maraş

Sürme

Topan

. Understandability and actionability of artificial intelligence-assisted lymphedema education material in patients undergoing breast cancer surgery: expert evaluation. J Clin Nurs. 2025. https://doi.org/10.1111/jocn.70123

29.

Özbek

İC

Özduran

. Digital rehabilitation in Parkinson’s disease: the role of artificial intelligence-assisted exercise training. Turk J Osteoporos 2025. Published online Sep 12. https://doi.org/10.4274/tod.galenos.2025.66664

30.

Özbek

İC

. Evaluation of the quality, reliability, and popularity of YouTube videos on thoracic outlet syndrome: a critical analysis. Turkiye Klinikleri J Phy Med Rehabil Sci 2025; 28(3): 207–217. https://doi.org/10.31609/jpmrs.2024-103634

31.

Özbek

İC

. Evaluation of artificial intelligence-supported osteoarthritis information texts: content quality and readability analysis. Turkiye Klinikleri J Phy Med Rehabil Sci 2025; 28(1): 21–29. https://doi.org/10.31609/jpmrs.2024-103532

32.

Özbek

İC

Özduran

Hancı

. Artificial intelligence in pain: a comprehensive review. Med J West Black Sea 2026; 10(1): 168–178. https://doi.org/10.29058/mjwbs.1881313