Sage Journals: Discover world-class research

Abstract

Study Design

Cross-Sectional.

Objectives

Adult spinal deformity (ASD) affects 68% of the elderly, with surgical intervention carrying complication rates of up to 50%. Effective patient education is essential for managing expectations, yet high patient volumes can limit preoperative counseling. Language learning models (LLMs), such as ChatGPT, may supplement patient education. This study evaluates ChatGPT-3.5’s accuracy and readability in answering common patient questions regarding ASD surgery.

Methods

Structured interviews with ASD surgery patients identified 40 common preoperative questions, of which 19 were selected. Each question was posed to ChatGPT-3.5 in separate chat sessions to ensure independent responses. Three spine surgeons assessed response accuracy using a validated 4-point scale (1 = excellent, 4 = unsatisfactory). Readability was analyzed using the Flesch-Kincaid Grade Level formula.

Results

Patient inquiries fell into four themes: (1) Preoperative preparation, (2) Recovery (pain expectations, physical therapy), (3) Lifestyle modifications, and (4) Postoperative course. Accuracy scores varies: Preoperative responses averaged 1.67, Recovery and lifestyle responses 1.33, and postoperative responses 2.0. 59.7% of responses were excellent (no clarification needed), 26.3% were satisfactory (minimal clarification needed), 12.3% required moderate clarification, and 1.8% were unsatisfactory, with one response (“Will my pain return or worsen?”) rated inaccurate by all reviewers. Readability analysis showed all 19 responses exceeded the eight-grade reading level by an average of 5.91 grade levels.

Conclusion

ChatGPT-3.5 demonstrates potential as a supplemental patient education tool but provides varying accuracy and complex readability. While it may support patient understanding, the complexity of its responses may limit usefulness for individuals with lower health literacy.

Keywords

adult artificial intelligence ChatGPT deformity patient education spinal

Introduction

Adult spinal deformity (ASD) comprises a spectrum of disorders characterized by abnormal spine alignment. Deformity progression may cause pain and can lead to significant disability and decreased quality of life, necessitating surgical intervention.¹ However, the surgical management of ASD is associated with high complication rates and protracted recovery timelines. Thus, effective patient education is essential to help align patient expectations with realistic outcomes.^2,3

Preoperative patient education has also been shown to reduce perioperative anxiety and pain across a variety of orthopaedic subspecialties.^4-6 Unfortunately, nearly 50% of patients undergoing spine surgery are dissatisfied with the preoperative education they receive.⁷ This may ultimately have a negative effect on their overall satisfaction with surgery.⁸ The cause for dissatisfaction with preoperative education is likely multifactorial but includes (1) the limited time for patients to ask questions during clinic visits and (2) their difficulty in understanding available educational materials. Notably, the average American reads at a sixth-grade level (equivalent to that of an 11-12-year-old), yet most patient education materials are written at a ninth- to eleventh-grade level (suitable for individuals aged 14-17).^9,10 This mismatch creates a substantial barrier to comprehension for many patients.

Patients now increasingly turn to online resources to fill in knowledge gaps regarding their diagnosis and planned surgery, with up to 64% using sources such as Facebook, WebMD, and YouTube.¹¹ In the past few years, online artificial intelligence (AI)-driven large language models (LLM) like ChatGPT (Open AI) have seen a rapid increase in popularity, representing a new resource for patients to obtain medical information. However, like other online resources, ChatGPT may generate inaccurate responses and should be thoroughly vetted by physicians prior to recommending its use. Initial research in orthopaedic surgery applications has shown promise: a series of studies in total joint arthroplasty found that ChatGPT can consistently and accurately address frequently asked questions related to these procedures.^12,13

Despite these developments, there remains a gap in the literature regarding ChatGPT’s potential role in educating patients about ASD surgery. Given the prevalence of ASD and the lack of adequate patient education resources, it is important to explore how LLMs can serve as resources for patient education and investigate their accuracy within ASD correction surgery specifically. Therefore, the objective of this study is to assess whether ChatGPT can accurately answer frequently asked questions related to ASD surgery. A secondary aim is to evaluate whether ChatGPT’s responses are written at a readability level appropriate for the average American adult.

Methods

Question Curation

We identified patients who had undergone spine deformity correction surgery between 2020 and 2022 from our institutional database. This included patients with a diagnosis of ASD who had underwent at least a seven-level thoracolumbar spine fusion. Ten patients were randomly selected to be contacted and interviewed, of whom seven were willing to participate in the study. Eighteen open-ended questions, developed by fellowship-trained spine surgeons, were posed to these patients between July 2022 and March 2023 to assess preoperative expectations and postoperative outcomes. Analysis of unstructured interviews revealed four common themes: preoperative preparation, recovery, lifestyle modifications, and general postoperative course. In addition, we reviewed the “Frequently Asked Questions” pages from online health institutions, including OrthoInfo, Norton Healthcare, and Journal of Neurosurgery: Spine. Combining these sources, we generated 40 potential questions for ChatGPT. Questions that had repeated themes with similar phrasing were excluded to account for redundancy, which ultimately led to a final list of 19 questions (Supplemental Table 1). A board-certified spine surgeon (R.K.A) screened and approved the final selection. This study is classified as Institutional Review Board exempt as no patient-identifying data was collected or stored during the survey process.

ChatGPT-3.5 Response Generation

On March 3, 2024, we entered the selected questions into the free online AI LLM chatbot, ChatGPT. Each question was asked in a “New Chat” to generate novel responses, as the AI learns from previous inputs within the same chat. We recorded initial answers and removed any statements deferring to the patient’s surgeon. Complete and adjusted responses produced by ChatGPT can be found in Supplemental Table 1.

ChatGPT Response Grading

Three board-certified orthopaedic spine surgeons evaluated ChatGPT generated responses to the 19 queries. The reviewers were instructed to assess the accuracy of ChatGPT responses adapted from a previously published study evaluating AI-generated responses in total hip arthroplasty¹²:

1. Excellent response not requiring clarification

2. Satisfactory response requiring minimal clarification

3. Satisfactory response requiring moderate clarification

4. Unsatisfactory response requiring substantial clarification

This scale was chosen as it can assess the clinical appropriateness and completeness of information, making it suitable for assessing the accuracy of ChatGPT’s responses.

Readability Grading

We evaluated the readability of complete ChatGPT responses using the Flesch-Kincaid Grade Level (FKGL) formula, a widely validated formula for measuring readability in terms of academic grade levels. Although other readability scales exist for analysis, the FKGL formula stands as the most extensively validated and accepted method.¹⁴ FKGL was computed utilizing the Microsoft Word built-in calculator, adhering to the formula¹⁵:

0.39 \times [average number of words per sentence] + 11.8 \times [average number of syllables per word] - 15.59 .

The resulting FKGL readability scores were then compared to the average US adult reading level (eighth grade) and the American Medical Association/National Institutes of Health recommended reading level (sixth grade).^16,17

Statistical Analysis

The proportions of each grade to ChatGPT responses were calculated and reported as percentages. A two-way random effects intraclass correlation coefficient (ICC) was calculated to evaluate for interrater reliability of grading between three board-certified orthopaedic spine surgeons. Unpaired T-tests were performed to compare the FKGLs of ChatGPT responses with AMA/NIH recommended and mean US FKGLs. All analyses were conducted using Microsoft Excel (version 16.84). A P value of <0.05 was defined as significant.

Results

Seven patients who underwent ASD surgery between 2020 and 2022 were identified and agreed to participate in in-person interviews to assist with question generation.

Accuracy of Identified Themes

Our analysis identified four major themes from patient interviews: preoperative preparation, recovery, lifestyle modifications, and postoperative course. Three board-certified spine surgeons reviewed and graded 19 responses generated by ChatGPT (Supplemental Table 1).

The mean accuracy of ChatGPT responses generated was determined by question (Table 1) and theme (Table 2). Preoperative preparation was the most common theme, accounting for 6 of the 19 questions. These questions encompassed patient education (ie, risks, complications, and procedure efficacy) and environmental optimization (ie, transportation and home preparation). ChatGPT responses to these questions received an average score of 1.61. The recovery theme, covering pain expectations and management, postoperative course (including physical therapy), and patient support, received an average accuracy score of 1.33. Similarly, the lifestyle modification theme, focusing on physical activity restrictions and limitations, also received an average accuracy score of 1.33. The postoperative course theme, which included follow-up appointments, risks, complications, and long-term pain management, received the lowest accuracy score, averaging 2.0.

Table 1.

Grading of Responses Generated by ChatGPT to Common Queries Posed by Adult Spinal Deformity Surgery Patients. The Grading Scale Is as Follows: (1) Excellent Response not Requiring Clarification; (2) Satisfactory Response Requiring Minimal Clarification; (3) Satisfactory Response Requiring Moderate Clarification; (4) Unsatisfactory Responses Requiring Substantial Clarification. Mean Grade per Question was Determined. Poor Agreement was Assessed Between Reviewers With an ICC of 0.255

Question	ChatGPT-3.5 response grades
Question	Reviewer 1	Reviewer 2	Reviewer 3	Mean grade ± SD
1	1	3	1	1.7 ± 1.15
2	2	3	2	2.3 ± 0.58
3	1	1	1	1.0 ± 0
4	3	2	2	2.3 ± 0.58
5	2	1	1	1.3 ± 0.58
6	1	1	1	1.0 ± 0
7	1	2	2	1.7 ± 0.58
8	2	1	1	1.3 ± 0.58
9	1	2	1	1.3 ± 0.58
10	1	1	1	1.0 ± 0
11	1	1	2	1.3 ± 0.58
12	1	3	2	2.0 ± 1.0
13	1	2	1	1.3 ± 0.58
14	1	1	1	1.0 ± 0
15	1	1	1	1.0 ± 0
16	4	3	2	3.0 ± 1.0
17	2	2	1	1.7 ± 0.58
18	1	3	1	1.7 ± 1.15
19	1	3	1	1.7 ± 1.15

ChatGPT, Chat Generative Pre-trained Transformer; ICC, Intraclass Correlation.

Table 2.

Mean Grade Assigned to ChatGPT Generated Responses by Question Theme

Question theme	Mean grade ± SD
Preoperative preparation (N = 6)	1.61 ± 0.61
Recovery (N = 5)	1.33 ± 0.24
Lifestyle modifications (N = 4)	1.33 ± 0.47
Postoperative course (N = 4)	2.00 ± 0.67

ChatGPT, Chat Generative Pre-Trained Transformer.

Questions

A total of 19 questions regarding adult spinal deformity surgery were input into ChatGPT (Supplemental Table 1). ChatGPT generated responses were graded as 59.7 ± 15.2% comprehensive, 26.3 ± 5.3% as satisfactory with minimal clarification required, 12.3 ± 16.9% as satisfactory with moderate clarification, required and 1.8 ± 3.0% as unsatisfactory response requiring substantial clarification (Table 3). The model’s response to “Is there a chance my pain will return and get worse after undergoing spinal deformity surgery?” had both the most disagreement between graders, as well as the lowest grade, receiving an average score of 3.0. Interrater reliability between the three graders was poor, with an ICC of 0.255.

Table 3.

Distribution of Grades Assigned by Three Board-Certified Spine Surgeons to Responses Generated by ChatGPT. Mean Distribution of Grades and Standard Deviations Were Recorded

Grade	Distribution of grades assigned by reviewer (%)			Mean distribution ± SD (%)
Grade	Reviewer 1	Reviewer 2	Reviewer 3	Mean distribution ± SD (%)
1	68.4 (13/19)	42.1 (8/19)	68.4 (13/19)	59.7 ± 15.2
2	21.1 (4/19)	26.3 (5/19)	31.6 (6/19)	26.3 ± 5.3
3	5.3 (1/19)	31.6 (6/19)	0 (0/19)	12.3 ± 16.9
4	5.7 (1/19)	0 (0/19)	0 (0/19)	1.8 ± 3.0

ChatGPT, Chat Generative Pre-Trained Transformer.

Readability

The FKGL readability scores for all 19 ChatGPT responses were calculated and compared to AMA/NIH recommended levels and the average U.S. reading level (Table 4). The mean of ChatGPT generated responses was 13.9 grade levels, exceeding the recommended sixth grade reading level by 7.9 grade levels and the average U.S. reading level (eighth grade) by 5.9 grade levels (Figure 1). The mean differences were found to be statistically significant (P < 0.05). While two responses were rated at a middle school level, the majority of the ChatGPT generated responses ranged from a high school to post-graduate reading level. On average, the postoperative course had the least variability and lowest reading level, while responses pertaining to the recovery theme had the highest readability scores (Table 5).

Table 4.

Flesch-Kincaid Grade Level (FKGL) Readability Grading of ChatGPT-3.5 Responses to Common Questions Asked by Adult Spinal Deformity Surgery Patients With Demonstration of Significant Differences Between AMA/NIH FKGL Recommendations and Mean U.S. FKGL

Question	FKGL	Grade levels above AMA/NIH recommendation (FKGL 6.0)*	Grade levels above US adult reading level (FKGL 8.0)*
1	14.7	8.7	6.7
2	14.5	8.5	6.5
3	9.2	3.2	1.2
4	15.2	9.2	7.2
5	13.7	7.7	5.7
6	12.8	6.8	4.8
7	13.2	7.2	5.2
8	14.2	8.2	6.2
9	12.6	6.6	4.6
10	15.7	9.7	7.7
11	15.9	9.9	7.9
12	15.5	9.5	7.5
13	15.9	9.9	7.9
14	12.2	6.2	4.2
15	12.6	6.6	4.6
16	13.9	7.9	5.9
17	13.9	7.9	5.9
18	14.6	8.6	6.6
19	13.9	7.9	5.9

ChatGPT, Chat Generative Pre-Trained Transformer; AMA, American Medical Association; NIH, National Institutes of Health.

*All values are significant (P < 0.05).

Figure 1.

The Distribution of ChatGPT Generated Responses to Common Questions Regarding Adult Spinal Deformity Surgery by Flesch-Kincaid Grade Level Compared to the Average U.S. Adult Reading Level and AMA/NIH Recommended Reading Level. ChatGPT, Chat Generative Pre-Trained Transformer; AMA, American Medical Association; NIH, National Institutes of Health

Table 5.

Mean Flesch-Kincaid Grade Level (FKGL) of ChatGPT Generated Responses by Question theme

Question theme	Mean FKGL
Preoperative preparation (N = 6)	13.35
Recovery (N = 5)	14.32
Lifestyle modifications (N = 4)	14.05
Postoperative course (N = 4)	14.08

Discussion

AI-driven chatbots like ChatGPT represent a promising new technology with numerous potential applications in healthcare including patient education. The primary aim of this study was to evaluate the accuracy and accessibility of ChatGPT in responding to common patient questions regarding ASD correction surgery. The most important finding of our study was that ChatGPT can accurately answer questions related to ASD surgery: 86% of generated responses required minimal to no clarification based on the consensus opinions of three board-certified spine surgeons. However, the accuracy of the generated response varied based on the type of question asked, with ChatGPT performing better on questions related to preoperative preparation and worse on questions related to expected postoperative course. Moreover, the responses were at a reading level nearly six grades higher than the national average, which may limit the accessibility. The relevance of this study lies in its implications for patient education, the advancement of AI in healthcare, the need for accessibility in medical communication, and its potential to transform patient engagement and healthcare delivery systems.

Recent literature has evaluated the accuracy of ChatGPT-generated responses to medical inquiries within the fields of orthopaedic and bariatric surgery. In a study conducted by Wright et al, the ability of ChatGPT to deliver accurate and comprehensive information regarding total hip and knee arthroplasty was assessed.^12,13 Researchers posed the 20 most frequently Google-searched questions on this topic to ChatGPT, and the responses were subsequently graded by five orthopaedic surgery residents. The findings revealed that ChatGPT’s responses achieved an accuracy rate of 85.2% and a comprehensiveness rate of 75.8%. Similarly, Samaan et al. examined ChatGPT’s performance in addressing questions related to bariatric surgery.¹⁸ Utilizing a Likert scale that integrated both accuracy and comprehensiveness, the study found that 86.8% of ChatGPT’s responses were rated as accurate and comprehensive by two board-certified, fellowship-trained bariatric surgeons.

Similar to the aforementioned studies, we also found ChatGPT can generate highly accurate responses for ASD surgery. While there was poor interrater reliability, this likely reflects the nuanced and complex nature of clinical scenarios under evaluation in addition to a small number of reviewers. The high accuracy rate of 86%, combined with the minimal need for clarification, highlights ChatGPT’s strong potential to provide more accurate and reliable medical information to patients than currently available online resources. For example, prior reviews of various websites, such as the Mayo Clinic and WebMD, addressing orthopedic disorders have found that only 29-44% of the search results contained suitable information.^19,20 These results collectively suggest that ChatGPT can serve as a valuable tool in clinical settings, not only for delivering precise information but also for enhancing patient education and engagement. ChatGPT has the potential to support informed decision-making and foster more effective communication between patients and healthcare providers.

The use of ChatGPT may be more limited in addressing patient questions about their postoperative course, which had an average accuracy score of 2.0. Previous research found that ChatGPT’s responses to postoperative questions are harder to understand and less actionable compared to a Google search.²¹ Mika et al. found ChatGPT responses to questions pertaining to the general postoperative course after total hip arthroplasty required moderate clarification.¹² We believe this decline in accuracy may be due to the subjective nature of these questions, which often require individualized answers. For example, the question: “Is there a chance my pain will return and get worse after undergoing spinal deformity surgery?” had both the most disagreement between graders, as well as the lowest grade, receiving an average score of 3.0. Answering this question relies on many variables (ie, type of procedure, type of deformity, preoperative pain severity, associated health issues, etc.) and would require the patient to enter in specific personal and procedural information. Additionally, the chatbot would need to exercise higher order thinking to adequately answer a question like this, an area where ChatGPT has shown limitations.^22-24 Furthermore, some ChatGPT responses contained misconceptions. For example, recommendations regarding the use of specific furniture or braces appeared to be influenced more by market forces and anecdotal sources than by evidence-based guidelines. These suggestions may lead patients to make unnecessary financial investments without clear benefits to postoperative outcomes.

In addition to potential limitations regarding specific question types, we also found that ChatGPT responses may be challenging to comprehend. On average, ChatGPT responses surpassed the average U.S. reading level (eighth grade) by an average difference of 5.9 grade levels. Although ChatGPT boasts the advantage of being easy to access, its base response to prompts may not be as easy to understand as prior literature might indicate.¹⁸ Previous studies have investigated ChatGPT’s ability to provide ‘easier’ responses in the setting of total hip arthroplasty and total knee arthroplasty.¹³ Wright et al found that ChatGPT can significantly reduce reading levels of its responses, however, it coincided with a significant decrease in comprehensiveness, possibly due to the chatbot removing key info at the expense of creating a more accessible reading level. These findings reinforce the idea found within the literature that ChatGPT-3.5, in its current version, is most appropriate as an adjunct and not a replacement to discussions with a physician.²⁵ Unlike AI models, physicians can dynamically adjust the complexity and depth of their explanations to match each patient’s educational background and specific inquiries, a capability not yet replicated by current AI technologies. Nonetheless, ChatGPT may serve as a valuable supplemental tool; physicians could consider using AI-generated responses to frequently asked questions as the basis for patient education materials such as brochures.

This study has several limitations. First, the sample size was relatively small, with only seven patients. This limitation was partially mitigated by referencing frequently asked questions from health institutions to help develop the question set. Additionally, two of the reviewers (RJH, RKA) were not blinded to the study due to their involvement in the study design. We recognize this could have led to bias in the assessment of their responses. Another limitation to our study was the absence of a comparison group; for example, we did not directly compare web results for internet searched questions to ChatGPT’s responses. External validity is also limited by continuous updates from OpenAI, such as the launch of GPT-4. Identical questions posed to the chatbot in the future may yield significantly different results. Furthermore, the substantial variability in reviewer scores may have skewed the response rating, thereby limiting the strength and generalizability of the conclusions drawn from these aggregated scores. Lastly, surgeon-related factors (eg, skill and experience) and approach-specific variables (eg, open vs minimally invasive techniques) may have contributed to the variability in the accuracy grading of ChatGPT’s responses.

Future research should explore two key areas: first, whether prompting ChatGPT to adjust its language complexity to specific reading levels can maintain accuracy while providing more tailored, accessible information; and second, if incorporating patient health data and procedural details into ChatGPT can enhance the accuracy of its postoperative care guidance. These investigations will be crucial in optimizing ChatGPT’s integration into clinical settings and its potential to improve patient education and satisfaction.

Conclusion

ChatGPT demonstrates significant potential as a technological tool in healthcare, particularly in addressing common preoperative education questions. However, its responses to postoperative care queries were less reliable, and its high readability level may limit accessibility for patients with lower health literacy. Unlike physicians, ChatGPT cannot adapt responses to individual comprehension levels. Thus, while it may serve as a useful adjunct, it should not replace direct physician-patient communication. Future research should focus on developing LLMs that can adjust response complexity based on patient health literacy.

Supplemental Material

Supplemental Material - Evaluating the Accuracy and Readability of ChatGPT in Addressing Patient Queries on Adult Spinal Deformity Surgery

Supplemental Material for Evaluating the Accuracy and Readability of ChatGPT in Addressing Patient Queries on Adult Spinal Deformity Surgery by Fergui Hernandez, BS, Rafael Guizar III, BS, Henry Avetisian, MS, Marc A. Abdou, MS, William J. Karakash, BS, Andy Ton, MD, Matthew C. Gallo, MD, Jacob R. Ball, MD, Jeffrey C. Wang, MD, Ram K. Alluri, MD, Raymond J. Hah, MD, and Michael Safaee, MD in Global Spine Journal.

Footnotes

ORCID iDs

Henry Avetisian

Marc A. Abdou

William J. Karakash

Andy Ton

Jacob R. Ball

Ram K. Alluri

Raymond J. Hah

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interest

Fergui Hernandez, Rafael Guizar III, Henry Avetisian, Marc A Abdou, William J. Karakash, Andy Ton, Matthew C Gallo, Jacob Ball, and Michael M. Safaee have nothing to disclose. Jeffrey C. Wang has received intellectual property royalties from Zimmer Biomet, NovApproach, SeaSpine, and DePuy Synthes. Raymond J. Hah has received grant funding from SI bone, consulting fees from NuVasive, and support from the North American Spine Society to attend meetings. Ram K. Alluri has received grant funding from NIH, consulting fees from HIA Technologies, and payment from Eccential Robotics for lectures and presentations.

Data Availability Statement

Data is not publicly available but can be available upon request.*

IRB Statement

This study is classified as Institutional Review Board exempt as no patient-identifying data was collected or stored during the survey process.

Supplemental Material

Supplemental material for this article is available online.

References

Cho

Kim

Seo

Shin

. Radiological evaluation and classification of adult spinal deformity. Journal of the Korean Orthopaedic Association 2016;51(1).

Zuckerman

Cerpa

Lenke

, et al. Patient-reported outcomes after complex adult spinal deformity surgery: 5-year results of the scoli-risk-1 study. Glob Spine J. 2022;12(8):1736-1744.

Auerbach

Lenke

Bridwell

, et al. Major complications and comparison between 3-column osteotomy techniques in 105 consecutive spinal deformity procedures. Spine. 2012;37(14):1198-1210.

Prouty

Cooper

Thomas

, et al. Multidisciplinary patient education for total joint replacement surgery patients. Orthop Nurs. 2006;25(4):257-261. quiz 62-3.

LaMontagne

Hepworth

Salisbury

Cohen

. Effects of coping instruction in reducing young adolescents’ pain after major spinal surgery. Orthop Nurs. 2003;22(6):398-403.

Vukomanovic

Popovic

Durovic

Krstic

. The effects of short-term preoperative physical therapy and education on early functional recovery of patients younger than 70 undergoing total hip arthroplasty. Vojnosanit Pregl. 2008;65(4):291-297.

Ronnberg

Lind

Zoega

Halldin

Gellerstedt

Brisby

. Patients' satisfaction with provided care/information and expectations on clinical outcome after lumbar disc herniation surgery. Spine. 2007;32(2):256-261.

Rampersaud

Canizares

Perruccio

, et al. Fulfillment of patient expectations after spine surgery is critical to patient satisfaction: a cohort study of spine surgery patients. Neurosurgery. 2022;91(1):173-181.

Eltorai

Sharma

Wang

Daniels

. Most American academy of orthopaedic surgeons’ online patient education material exceeds average patient reading level. Clin Orthop Relat Res. 2015;473(4):1181-1186.

10.

Spandorfer

Karras

Hughes

Caputo

. Comprehension of discharge instructions by patients in an urban emergency department. Ann Emerg Med. 1995;25(1):71-74.

11.

Hautala

Comadoll

Raffetto

, et al.

Most orthopaedic trauma patients are using the internet, but do you know where they’re going?

Injury. 2021;52(11):3299-3303.

12.

Mika

Martin

Engstrom

Polkowski

Wilson

. Assessing ChatGPT responses to common patient questions regarding total hip arthroplasty. J Bone Joint Surg Am. 2023;105(19):1519-1526.

13.

Wright

Bodnar

Moore

, et al.

Is ChatGPT a trusted source of information for total hip and knee arthroplasty patients?

Bone Jt Open. 2024;5(2):139-146.

14.

Badarudeen

Sabharwal

. Readability of patient education materials from the American academy of orthopaedic surgeons and pediatric orthopaedic society of North America web sites. J Bone Joint Surg Am. 2008;90(1):199-204.

15.

Kianian

Sun

Crowell

Tsui

. The use of large language models to generate education materials about uveitis. Ophthalmol Retina. 2024;8(2):195-201.

16.

Rooney

Santiago

Perni

, et al. Readability of patient education materials from high-impact medical journals: a 20-year analysis. J Patient Exp. 2021;8:2374373521998847.

17.

Para

Thelmo

Rynecki

, et al. Evaluating the readability of online patient education materials related to orthopedic oncology. Orthopedics. 2021;44(1):38-42.

18.

Samaan

Yeo

Rajeev

, et al. Assessing the accuracy of responses by the language model ChatGPT to questions regarding bariatric surgery. Obes Surg. 2023;33(6):1790-1796.

19.

Starman

Gettys

Capo

Fleischli

Norton

Karunakar

. Quality and content of internet-based information for ten common orthopaedic sports medicine diagnoses. J Bone Joint Surg Am. 2010;92(7):1612-1618.

20.

Feller

Cohen

Palumbo

Daniels

. Lumbar spinal stenosis: evaluation of information on the internet. Med Health R I. 2012;95(11):342-344.

21.

Ayoub

Lee

Grimm

Balakrishnan

. Comparison between ChatGPT and Google search as sources of postoperative patient instructions. JAMA Otolaryngol Head Neck Surg. 2023;149(6):556-558.

22.

Scheer

Smith

Clark

, et al. Comprehensive study of back and leg pain improvements after adult spinal deformity surgery: analysis of 421 patients with 2-year follow-up and of the impact of the surgery on treatment satisfaction. J Neurosurg Spine. 2015;22(5):540-553.

23.

Giorgino

Alessandri-Bonetti

Luca

, et al. ChatGPT in orthopedics: a narrative review exploring the potential of artificial intelligence in orthopedic practice. Front Surg. 2023;10:1284015.

24.

Chatterjee

Bhattacharya

Pal

Lee

Chakraborty

. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop. 2023;10(1):128.

25.

Samaan

Yeo

Rajeev

Srinivasan

Samakar

. Inclusive AI in healthcare: enhancing bariatric surgery education for diverse patient populations. Obes Surg. 2024;34(1):270-271.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.14 MB