Sage Journals: Discover world-class research

Abstract

Background:

Carpal tunnel syndrome (CTS) is a prevalent neuropathy in hand surgery that significantly affects people’s quality of life. Frequently, patients conduct research online before seeking medical care. Large language models (LLMs) like ChatGPT are increasingly used for health information, yet concerns remain regarding the accuracy, readability, and complexity of their responses. Previous studies have assessed older ChatGPT models but have not comprehensively compared newer versions. The purpose of this study is to compare ChatGPT-4-generated, ChatGPT-4o-generated, and ChatGPT-o1-generated answers to common CTS-related patient questions.

Methods:

Six frequently asked CTS questions were queried of each LLM. Responses were independently graded by 2 board-certified hand surgeons using evidence-based guidelines. Lexical diversity was assessed using the Measure of Textual Lexical Diversity, and readability was evaluated using the Flesch-Kincaid Grade Level, Flesch Reading Ease Score, and Simple Measure of Gobbledygook. Analysis of variance or Kruskal-Wallis with post hoc tests were conducted to compare LLMs and questions.

Results:

All 3 ChatGPT models averaged 93% accuracy with no significant differences between them, though a significant difference in accuracy was observed between questions 3 and 5. Readability scores between models varied significantly, with ChatGPT-4o generating the most readable responses and ChatGPT-o1 producing the most complex answers.

Conclusions:

While LLMs had similar accuracy, ChatGPT-4o offered the most patient-friendly content. Furthermore, the readability of all models remains above the recommended level for the general population. Future work should explore whether fine-tuning or advancements in model design can enhance accessibility for a broader audience.

Keywords

artificial intelligence carpal tunnel syndrome ChatGPT language learning model readability

Get full access to this article

View all access options for this article.

References

Padua

Cuccagna

Giovannini

, et al. Carpal tunnel syndrome: updated evidence and new questions. Lancet Neurol. 2023;22(3):255-267. doi:10.1016/S1474-4422(22)00432-X

LeBlanc

Cestia

Carpalion tunnel syndrome. Am Fam Physician. 2011;83(8):952-958.

Becker

Nora

Gomes

, et al. An evaluation of gender, obesity, age and diabetes mellitus as risk factors for carpal tunnel syndrome. Clin Neurophysiol. 2002;113(9):1429-1434. doi:10.1016/S1388-2457(02)00201-8

Jerosch-Herold

Houghton

Blake

, et al. Association of psychological distress, quality of life and costs with carpal tunnel syndrome severity: a cross-sectional analysis of the PALMS cohort. BMJ Open. 2017;7(11):e017732. doi:10.1136/bmjopen-2017-017732

Özkan

Mellema

Nazzal

, et al. Online health information seeking in hand and upper extremity surgery. J Hand Surg Am. 2016;41(12):e469-e475. doi:10.1016/j.jhsa.2016.09.006

Aung

Tokumi

, et al. Does a directive to an Internet site enhance the doctor-patient interaction? A prospective randomized study for patients with carpal tunnel syndrome. J Bone Joint Surg Am. 2015;97(13):1112-1118. doi:10.2106/JBJS.N.00741

Briet

Hageman

Blok

, et al. When do patients with hand illness seek online health consultations and what do they ask. Clin Orthop Relat Res. 2014;472(4):1246-1250. doi:10.1007/s11999-014-3461-9

Clusmann

Kolbinger

Muti

, et al. The future landscape of large language models in medicine. Commun Med. 2023;3(1):141. doi:10.1038/s43856-023-00370-1

Chandra

Singh

Goyal

, et al. Early versus delayed endoscopic surgery for carpal tunnel syndrome: prospective randomized study. World Neurosurg. 2013;79(5-6):767-772. doi:10.1016/j.wneu.2012.08.008

10.

Cha

Shin

Ahn

, et al. Differences in the postoperative outcomes according to the primary treatment options chosen by patients with carpal tunnel syndrome: conservative versus operative treatment. Ann Plast Surg. 2016;77(1):80-84. doi:10.1097/SAP.0000000000000598

11.

Artioli

Veronesi

Mazzotti

, et al. Assessing ChatGPT responses to common patient questions regarding total ankle arthroplasty. J Exp Orthop. 2025;12(1):e70138. doi:10.1002/jeo2.70138

12.

Cao

Wang

Zhang

, et al. Large language models’ performances regarding common patient questions about osteoarthritis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and perplexity. J Sport Health Sci. Published online December 14, 2025. doi:10.1016/j.jshs.2024.101016

13.

White

Masturov

Haunschild

, et al. Can ChatGPT reliably answer the most common patient questions regarding total shoulder arthroplasty? J Shoulder Elbow Surg. 2025;34:e254-e264. doi:10.1016/j.jse.2024.08.025

14.

Mika

Martin

Engstrom

, et al. Assessing ChatGPT responses to common patient questions regarding total hip arthroplasty. J Bone Joint Surg Am. 2023;105(19):1519-1526. doi:10.2106/JBJS.23.00209

15.

Jagiella-Lodise

Suh

Zelenski

NA.

Can patients rely on ChatGPT to answer hand pathology–related medical questions?

Hand (N Y). 2025;20:801-809. doi:10.1177/15589447241247246

16.

Seth

Xie

Rodwell

, et al. Exploring the role of a large language model on carpal tunnel syndrome management: an observation study of ChatGPT. J Hand Surg Am. 2023;48(10):1025-1033. doi:10.1016/j.jhsa.2023.07.003

17.

Amen

Torabian

Subramanian

, et al. Quality of ChatGPT responses to frequently asked questions in carpal tunnel release surgery. Plast Reconstr Surg Glob Open. 2024;12(5):e5822. doi:10.1097/GOX.0000000000005822

18.

Lavoie-Gagne

Hassan

Lai

, et al. Google vs ChatGPT for patient education in hand surgery: a readability and quality comparison. Hand (N Y). 2025.

19.

Browne

Gomez-Cabello

Haider

, et al. Improving the readability of hand surgery society patient education materials using ChatGPT-4. J Hand Surg Glob Online. 2024;6(3):436-438. doi:10.1016/j.jhsg.2024.03.003

20.

Abdelmalek

Marder

Richards

, et al. ChatGPT-3.5 and -4.0 do not reliably create readable patient education materials for common orthopaedic upper- and lower-extremity conditions. Arthrosc Sports Med Rehabil. 2025;7(1):101027. doi:/10.1016/j.asmr.2024.101027

21.

American Society for Surgery of the Hand. Carpal tunnel syndrome. HandCare: The Upper Extremity Expert, 2021. https://www.assh.org/handcare/condition/carpal-tunnel-syndrome

22.

Sparks

Fasulo

Windsor

, et al. ChatGPT is moderately accurate in providing a general overview of orthopaedic conditions. JBJS Open Access. 2024;9(2):e2300129. doi:10.2106/JBJS.OA.23.00129

23.

Keith

Masear

Chung

, et al. Diagnosis of carpal tunnel syndrome. J Am Acad Orthop Surg. 2009;17(6):389-396. doi:10.5435/00124635-200906000-00007

24.

Wipperman

Goerl

Carpal tunnel syndrome: diagnosis and management. Am Fam Physician. 2016;94(12):993-999.

25.

Keith

Masear

Amadio

, et al. Treatment of carpal tunnel syndrome. J Am Acad Orthop Surg. 2009;17(6):397-405. doi:10.5435/00124635-200906000-00008

26.

Song

Kim

, et al. Assessing GPT-4’s performance in delivering medical advice: comparative analysis with human experts. JMIR Med Educ. 2024;10:e51282. doi:10.2196/51282

27.

Reviriego

Conde

Merino-Gómez

, et al. Playing with words: comparing the vocabulary and lexical diversity of ChatGPT and humans. Mach Learn Appl. 2024;18:100602. doi:10.1016/j.mlwa.2024.100602

28.

Gotlieb

Praska

Hendrickson

Marmet

Charpentier

Hause

, et al. Accuracy in Patient Understanding of Common Medical Phrases. JAMA Netw Open. 2022;5(12):e2242972. doi:10.1001/jamanetworkopen.2022.42972

29.

Gill

Bonamer

Kuechly

, et al. ChatGPT is a promising tool to increase readability of orthopedic research consents. J Orthop Trauma Rehabil. 2024;31(2):148-152. doi:10.1177/22104917231208212

30.

Hand

Bohn

Tannir

, et al. American Academy of Orthopaedic Surgeons OrthoInfo provides more readable information regarding rotator cuff injury than ChatGPT. J ISAKOS. 2025;12:100841. doi:10.1016/j.jisako.2025.100841

31.

Roberts

Zhang

Dyer

GSM

. The readability of AAOS patient education materials: evaluating the progress since 2008. J Bone Joint Surg Am. 2016;98(17):e70. doi:10.2106/JBJS.15.00658

32.

Rooney

Santiago

Perni

, et al. Readability of patient education materials from high-impact medical journals: a 20-year analysis. J Patient Exp. 2021;8:2374373521998847. doi:10.1177/2374373521998847

33.

Daulat

Dholaria

Burnet

, et al. Prompt engineering and follow-up questioning improves the readability of spine surgery questions in large language models. World Neurosurg. 2025;203:124423.