Sage Journals: Discover world-class research

Abstract

Study Design

Survey-based study.

Objective

To identify how to effectively tailor spine care information to different educational backgrounds, beyond general readability.

Methods

An internet-based survey (Connect™, CloudResearch) recruited 600 U.S. adults evenly distributed across age brackets (18-80). Participants were presented with six clinical scenarios, each with a question, five answer choices, and asked to choose the most sensible answer. ChatGPT-4 generated five response levels per question, scored with the Flesch Reading Ease scale to represent varying educational or readability levels. Chi-squared tests and standardized residual analysis compared preferred reading levels across education groups.

Results

Of the 600 participants (mean age 45 ± 15.7, 50.8% female), preferred selections were postgraduate (50.6%) and 8th-12th grade levels (22.1%). Bachelor’s degree holders significantly favored postgraduate-level content (55.3%), while postgraduate respondents predominantly preferred 8th-12th grade or simpler (56.2%). Participants with less than a bachelor’s degree similarly preferred 8th-12th grade or lower (53.5%). Standardized residual analysis revealed bachelor’s participants selected postgraduate content more frequently than expected, whereas postgraduate and below-bachelor participants selected it less than expected.

Conclusions

While postgraduate-level responses were the most frequently selected overall, preferences varied substantially by education level. Both postgraduate and below-bachelor participants tended to favor simpler content, while bachelor’s-level respondents consistently preferred more complex language. These findings underscore that simplifying medical information may improve accessibility across diverse educational backgrounds. When tailoring content by education level is not feasible, aiming for an 8th-12th grade reading level may offer the most effective balance between clarity and comprehensiveness in spine care communication.

Keywords

spine care readability educational level patient communication health literacy

Introduction

Healthcare literacy is a critical skill for navigating the healthcare system effectively. However, a recent study found that approximately one-third (33%) of patients in a multi-surgeon spine practice demonstrated limited health literacy, highlighting a significant barrier to informed decision-making and care outcomes.¹ It is well established that patients presenting with spine pathology who have demonstrated lower levels of medical literacy, experience worse patient reported outcomes post-operatively compared to those with higher health literacy levels, although prior analyses did not control for comorbidities and may be biased by limited comprehension of the questionnaires.² An important aspect of healthcare literacy is the ability to read and effectively understand materials distributed in the healthcare setting. As such, the American Medical Association (AMA) recommends reading materials with medical information be distributed to patients at the fifth to sixth grade level.³ However, of the 121 spine-related websites recently surveyed, only three correlate with a Flesh-Kincaid grade level below or at the sixth grade.⁴ This indicates there is a discrepancy in the targeted reading levels (as recommended by the AMA) and the ones used in spine literature.⁵

ChatGPT is an artificial intelligence chatbot based on large language models. This means that given a specific input, it can generate a coherent and appropriate answer based on using machine learning algorithms. Most recently, patients have turned to ChatGPT as a source to address questions regarding diverse spine pathology and treatment recommendations. The answers generated were consistently at the collegiate level.^6,7 However, users can request that generated responses be presented at a reading level most suitable for each individual. The successful ability of ChatGPT to reduce complex medical information has been previously studied in otolaryngology.⁸ However, the optimal reading level for patient education materials in spine care has yet to be clearly established.

To our knowledge, this is the first study to stratify patients by educational attainment and systematically assess their readability preferences for medical information. By linking preference to education level, we aim to provide more granular insights into how patient education materials can be better tailored. This study aims to evaluate preferred reading levels for spine surgery educational materials based on an individual’s level of formal education. Requesting that ChatGPT generate standardized answer choices at varying reading levels, we seek to explore how health literacy preferences align with educational background. This approach may improved tailored and patient-centered methods of delivering spine care information, potentially reducing barriers to equitable healthcare access. We hypothesize that while preferred reading levels will vary by education level, a general preference for simpler, more accessible language will emerge, which would be consistent with the American Medical Association’s emphasis on “clear” patient communication.

Methods

Data Collection and Study Design

This is a survey-based study wherein individuals from the general American population completed surveys administered through the CloudResearch platform. Data collection was conducted in January 2025 using a combination of Connect and Prime Panels platforms by CloudResearch. CloudResearch’s Connect platform is known for its two-way reputation system, which allows participants to rate the projects and the researchers who post them. This rating system encourages good behavior from both parties thus enhancing reliability of data. Researchers are also able to set certain quotas for age, sex, race, and other demographics to match the US census; thereby improving capture from a representative sample. Participants are recruited by Connect via partnerships with established online panels that have large pools of pre-registered participants and targeted recruitment (eg, social media). The CloudResearch team continuously monitors data quality and handles feedback to address poor-quality survey responses. They implement technical safeguards to ensure each participant uses only one account and does not submit multiple responses from the same IP address, while also verifying that participants are based in the U.S.⁹ To ensure data quality, an attention check was included to detect and remove responses of low quality represented as a failure to respond appropriately to the attention check question. In our survey, our attention check consisted of a question asking participants to select the third answer choice which contained the phrase “For this question, please select this answer choice to indicate you are paying attention”. Following survey completion and data approval from the authors, participants received a small financial incentive.

We included participants aged 18-79 across evenly distributed 10-year age brackets, with efforts to match U.S. Census demographics for age, sex, and race. This approach ensured a sample closely reflecting the national population.¹⁰

A webpage on the Clinical Guidelines for Diagnosis and Treatment of Lumbar Disc Herniation with Radiculopathy through the North American Spine Society (NASS) (https://www.spine.org/Portals/0/assets/downloads/researchclinicalcare/guidelines/lumbardischerniation.pdf) was accessed in November 2024. Guideline information from this page was input into ChatGPT (OpenAI, San Francisco, CA; version 4.0) to generate answer choices for six different clinical scenarios (Appendix A). For each question, five answer choices were generated at different reading levels—Pre Kindergarten-3rd grade, 4th-7th grade, 8th-12th grade, undergraduate, and post-graduate— based on prior work demonstrating that large language models can effectively produce content across a comparable range of Lexile scores (300L-1200L+), spanning early elementary to college-level readability.¹¹ Of note, we separated PreK-3rd from 4th-7th to reflect functional literacy thresholds: roughly 14% of U.S. adults have been observed to be at “below basic” levels and may struggle with very simple text, while the 4th-7th grade range aligns with AMA and AHRQ recommendations to prepare patient materials at the 5th-6th grade level.^3,12,13 The history was reset between each prompt to ensure that ChatGPT’s responses were not influenced by previous queries. Flesch Reading Ease (FRE) and Flesch-Kincaid (FK) grade levels were used to assess the readability of generated responses. These metrics are commonly applied in evaluating patient education materials, as demonstrated by Kirchner et al. in their study on AI-enhanced orthopedic content.^8-10,14 Notably, ChatGPT-generated responses labeled as “Undergraduate” and “Postgraduate/Doctorate” both yielded FK scores above grade 16, indicating comparable complexity and consistent with graduate-level readability (Table 2). As a result, these two reading levels were combined into a single “Postgraduate/Doctorate” category for data visualization and statistical analysis to improve interpretability and ensure categorical consistency.

Demographic details were also collected in this study and included age, sex, race, highest level of education completed, and occupation. The survey had to be completed in full to be included in data analysis.

Response preferences were analyzed using a chi-squared test of independence to compare the distribution of selected reading levels across education groups. Post hoc pairwise chi-squared comparisons with Bonferroni correction were performed to assess between-group differences. Standardized residuals were also calculated to identify which specific education-reading level combinations deviated most from expected values under the null hypothesis.

Results

Participant Demographics and Sample Size

This study initially comprised 613 participants. The sample size was reduced to 600 after excluding 13 individuals for incomplete survey responses or failure to pass attention checks. The demographics of the final sample consisted of 305 (50.8%) females and 295 males (49.2%), with a mean age of 45 (SD = 15.7) and a race distribution closely aligning with the U.S. Census. Most participants identified as White (79.8%), and the most common occupation was in information technology (13.5%). In terms of education, 42.5% held a bachelor’s degree, 19.5% had some college without a degree, 12.3% had a high school diploma, and only 0.33% reported less than a high school education (Table 1).

Table 1.

Demographic Characteristics of Survey Participants (N = 600), including Age, Sex, Race, Occupation, and Education Level

Age (mean ± SD)	45 ± 15.7
Category	Count (%)
Sex
Female	305 (50.8%)
Male	294 (49.0%)
Intersex	1 (0.17%)
Race
White	479 (79.8%)
Black or African American	60 (10%)
Other (Chinese, Filipino, American Indian or Alaska native, Asian Indian, Japanese, Vietnamese, Korean, not listed)	61 (10.1%)
Occupation
Information technology	81 (13.5%)
Retired	58 (9.67%)
Education & training	37 (6.17%)
Retail	37 (6.17%)
Business management & administration	32 (5.33%)
Finance	31 (5.17%)
Science, technology, engineering, & mathematics	30 (5.0%)
Medicine	29 (4.83%)
Arts	28 (4.67%)
Manufacturing	21 (3.50%)
Government & public administration	17 (2.83%)
Hospitality & tourism	17 (2.83%)
Architecture & construction	16 (2.67%)
Transportation, distribution, & logistics	15 (2.50%)
Marketing and sales	13 (2.17%)
Legal	9 (1.50%)
Social sciences	6 (1.0%)
Agriculture, food, and natural resources	5 (0.83%)
Military	2 (0.33%)
Other/prefer not to answer	116 (19.3%)
Education
Bachelor’s degree	255 (42.5%)
Some college, but no degree	117 (19.5%)
High school diploma	74 (12.3%)
Associate degree	65 (10.8%)
Master’s degree	60 (10.0%)
Doctorate degree	15 (2.5%)
Professional degree (MD, DDs, DVM, JD)	11 (1.8%)
Less than a high school diploma	2 (0.33%)
Prefer not to answer	1 (0.17%)

ChatGPT’s Readability Scores at Different Levels

The average FRE scores for answer choices generated by ChatGPT at each reading level group were 73.7 (PreK-3rd), 61.2 (4th-7th), 27.5 (8th-12th), 9.5 (undergraduate), and 4.1 (postgraduate/doctorate). Flesch-Kincaid Grade Level scores, however, were consistently higher than the target levels ChatGPT was instructed to emulate (Table 2). Nonetheless, a stepwise increase in FK scores was observed across most reading levels, reflecting the intended gradient of increasing linguistic complexity.

Table 2.

ChatGPT-Generated Answer Choices for Each Clinical Case, With Associated Flesch-Kincaid Grade Level, Flesch Reading Ease Scores, and Frequency of Selection Across all Participants

	Answer choice	ChatGPT grade level	Flesch-kincaid grade level	Flesch reading ease	Frequency
Case 1	1	PreK-3rd	7.3	74.6	72
	2	4th-7th	9.0	58.9	281
	3	8th-12th	13.6	28.0	451
	4	Undergraduate	16.0	10.7	698
	5	Postgraduate/Doctorate	17.2	8.0	1502
Case 2	1	PreK-3rd	7.9	71.4	71
	2	4th-7th	7.5	71.8	238
	3	8th-12th	13.9	28.3	398
	4	Undergraduate	17.7	7.6	876
	5	Postgraduate/Doctorate	20.5	−8.0	1583
Case 3	1	PreK-3rd	6.1	75.3	24
	2	4th-7th	11.4	52.8	100
	3	8th-12th	15.9	25.2	167
	4	Undergraduate	17.2	20.2	226
	5	Postgraduate/Doctorate	16.6	12.1	517
Case 4	1	PreK-3rd	5.6	80.1	72
	2	4th-7th	11.3	54.3	281
	3	8th-12th	16.6	23.5	451
	4	Undergraduate	18.2	0.4	698
	5	Postgraduate/Doctorate	18.5	−5.3	1502
Case 5	1	PreK-3rd	6.8	73.8	71
	2	4th-7th	9.3	60.8	238
	3	8th-12th	14.0	31.3	398
	4	Undergraduate	18.6	0.9	876
	5	Postgraduate/Doctorate	18.8	4.3	1583
Case 6	1	PreK-3rd	7.3	67.1	24
	2	4th-7th	8.0	68.8	100
	3	8th-12th	14.6	28.7	167
	4	Undergraduate	17.3	17.1	226
	5	Postgraduate/Doctorate	16.1	13.2	517

Preferred Reading Levels Among Respondents

Across all respondents, the most commonly selected answer choices were at the Postgraduate/Doctorate (50.6%) and 8th-12th grade (22.1%) reading levels as generated by ChatGPT (Figure 1; Table 2). When stratified by education level, bachelor’s degree holders demonstrated the strongest preference for postgraduate-level responses (55.3%) (Figure 2; Table 3). Participants with less than a bachelor’s degree also frequently selected postgraduate-level content (46.5%) (Table 3). However, respondents with postgraduate degrees showed a different trend: they chose postgraduate-level content only 43.7% of the time and more often favored simpler alternatives. In fact, 57.1% of postgraduate-educated participants preferred responses written at or below the 8th-12th grade level, as did 53.5% of those with less than a bachelor’s degree (Figure 3; Table 3).

Figure 1.

Preferred Reading Level Responses for all Participants. Bar Plot Illustrating the Distribution of Preferred Reading Levels Across Six Spine-Related Clinical Scenarios (q1-q6). While “Postgraduate/Doctorate” Responses Were Most Frequently Selected Overall, a Consistent Portion of Participants Preferred Simpler Reading Levels, Particularly “8th-12th” and “4th-7th” Grade Options, Across all Questions

Figure 2.

Preferred Reading Level Responses for Participants With a Bachelor’s Degree Bar Chart Depicting the Distribution of Preferred Reading Levels Across Six Clinical Scenarios Among Participants With a Bachelor’s Degree. The Majority Consistently Favored Responses at the “Postgraduate/Doctorate” Level, With Notably Lower Selection of “PreK-3rd” and “4th-7th” Reading Levels

Table 3.

Frequencies of Preferred Reading Level by Education Level

Education level	PreK–3rd	4th-7th	8th-12th	Postgraduate/Doctorate	Total
Below bachelor’s	72	281	451	698	1502
Bachelor’s	71	238	398	876	1583
Post-graduate	24	100	167	226	517

Distribution of reading level preferences stratified by participant education level.

Figure 3.

Preferred Reading Level Responses for Participants With a Postgraduate Degree Among Individuals With Postgraduate Degrees, Including Those With Master’s, Doctoral (PhD, EdD), or Professional (MD, DDS, DVM, JD) Credentials, Preferences for Simpler Reading Levels Remained Prevalent Across all Clinical Scenarios. While “Postgraduate/Doctorate” Responses Were Frequently Chosen, a Substantial Proportion of Participants Selected “8th-12th” or Even “4th-7th” Grade Options, Particularly for Cases 1, 3, and 6

To evaluate whether preferred reading level distributions differed significantly by education level, a Chi-squared test of independence was performed and found a significant association between education level and preferred reading level (χ² = 31.2, df = 6, P < .001). Post hoc pairwise chi-squared tests with Bonferroni correction revealed that both bachelor’s vs postgraduate (P < .001) and bachelor’s vs below bachelor’s (P = .0007) comparisons were statistically significant. However, there was no significant difference between the below bachelor and postgraduate groups (P = 1.0). Standardized residuals revealed that bachelor’s degree holders selected postgraduate-level responses significantly more often than expected (standardized residual = +5.08), while both postgraduate respondents (−3.34) and those with less than a bachelor’s degree (−2.71) selected postgraduate content less frequently than expected. Conversely, postgraduate respondents selected 8th-12th grade reading levels more than expected (+2.64), supporting a divergence in reading level preferences based on educational attainment (Table 4).

Table 4.

Standardized Residuals Indicating which Reading Level Choices Were Over- or Under-selected by Each Education Group Relative to Expectation

Education level	PreK-3rd	4th-7th	8th-12th	Postgraduate/Doctorate
Bachelor’s	0.42	−2.44	−3.73	5.08
Below bachelor’s	−0.40	1.56	1.86	−2.71
Postgraduate	−0.02	1.24	2.64	−3.34

Discussion

Several studies have highlighted the persistent gap between the intended readability of medical content and its actual accessibility for diverse patient populations.^15-17 However, few have explored how patients across educational backgrounds prefer to engage with content at different reading levels. Lumbar MRIs obtained to assess for pathology contributing to low back and leg symptoms have been shown to be quite puzzling and alarming. This concern leads to a variety of internet searches for more information regarding these combinations of symptoms and MRI terminology.¹⁸ Our findings address this gap, showing that a preference for simpler language may transcend educational attainment—suggesting that accessibility, rather than complexity, drives engagement.

Similar to previous work, we found that ChatGPT often generates content above the intended reading level.^15-17,19 In our study, responses prompted at the “PreK-3rd grade” level averaged a Flesch-Kincaid (FK) score of 7.3, while “8th-12th grade” prompts averaged 13.4—paralleling findings by Hung et al., who reported that ChatGPT content intended for 7th-grade audiences instead yielded FK scores above 10.¹⁶ Likewise, Covington et al¹⁷ found that ChatGPT-generated medication guides averaged an FK grade of 8.6, overshooting recommended levels despite clear prompt directives. Nasra et al¹⁵ reported similar limitations across AI models, emphasizing the need for improved prompt engineering and output validation. However, none of these studies assessed individual preference across education levels, which is where this study offers new insights.

Importantly, we found that participants across all education groups tended to prefer simpler content than what ChatGPT typically produced. Among respondents with postgraduate degrees, only 43.7% selected responses aligned with their own reading level, while the majority opted for content written at the 8th-12th grade level or below. This challenges the assumption that more advanced education equates to a desire for more complex language, and instead points toward a broader, cross-demographic preference for clarity, usability, and cognitive ease. Our findings are reinforced by those of Nasra et al¹⁵ (2024), who reviewed 60 studies evaluating AI-generated patient education materials in academic writing, health care education, and health care practice/research.¹⁵ They found that users consistently expressed greater satisfaction with simplified outputs, regardless of their medical or educational background. In studies where AI-generated content was revised to meet lower reading levels, user engagement and preference ratings improved markedly. The authors emphasized that legibility, not technical depth, was the primary driver of usability. These conclusions align with our data, which show that even highly educated users often favor mid-level readability when given a choice. These findings may reflect the cognitive load patients experience when processing complex health information, especially in high-stress or unfamiliar clinical settings.

Interestingly, we also found that preference patterns did not increase linearly with education. Participants with a bachelor’s degree were significantly more likely to select postgraduate-level responses than both postgraduate and below-bachelor groups (P < .001 and P = .0007, respectively), yet there was no difference between the latter two groups (Table 5). This unexpected plateau suggests that, beyond a certain threshold, additional education does not correspond with greater tolerance—or desire—for complexity. To our knowledge, this nuance has not been previously documented in the AI health communication literature. However, this finding may be closely related to prior research demonstrating that overly technical materials may reduce engagement, even among literate populations. Safeer and Keenan²⁰ emphasized that patient education materials are often written well above recommended reading levels, and that even patients with adequate literacy report preferring simpler, more accessible language. Their work highlights a disconnect between what healthcare providers assume patients can engage with and what patients actually find useful, suggesting that complexity itself may be a barrier to effective communication. Badarudeen et al. reached similar conclusions in their analyses of orthopaedic education materials, finding that content consistently exceeded patients’ average reading ability and likely limited engagement.²¹ These findings support our own results by showing that preference for simpler content is not limited to those with less formal education. Rather, it appears consistent across education levels, reinforcing the need to align content design with user preferences rather than assumed literacy capacity.

Table 5.

Pairwise Chi-Squared Comparisons Between Education Levels

Comparison	Chi-squared	df	P-value	Bonferroni-adjusted P-value
Bachelor’s vs below bachelor’s	19.33	3	0.00023	0.00070
Bachelor’s vs postgraduate	22.94	3	0.00004	0.00012
Below bachelor’s vs postgraduate	2.99	3	0.3936	1.00000

Chi-squared test results comparing reading level preferences between education groups, with Bonferroni-adjusted P-values.

Taken together, these findings indicate that aiming for lower reading levels may increase comprehension, usability, and engagement across a diverse audience, regardless of formal education. In situations where tailoring is feasible, AI systems could be designed to offer real-time customization of readability based on user preference. Until such systems are widely implemented, targeting an 8th-12th grade reading level—or lower—may strike the best balance between accessibility and informational depth. Future studies should explore how aligning readability with preference impacts real-world outcomes such as knowledge retention, shared decision-making, and patient satisfaction.

Limitations

As with many survey-based studies, there are limitations to this study. First, the survey was distributed on a platform that allows participants to choose which surveys to participate in. Perhaps, we captured respondents with a special interest in this topic, thereby creating a potential selection bias. Second, our study relied on online survey responses, potentially skewing our sample towards individuals with higher education levels, who are more likely to be computer literate. Therefore, our study may not wholly reflect the general spine patient population. In fact, only 0.33% of participants reported having less than a high school diploma, compared to 9% of U.S. adults aged 25 and older nationally. Similarly, 12.3% of our respondents reported a high school diploma as their highest level of education, vs 28% nationally. Conversely, 56.8% of our sample held at least a bachelor’s degree, compared to just 37% of the general U.S. adult population.²² Our study did not control for medical comorbidities which may have an impact on health literacy as studies done outside of the United States have shown an association between the two.^23,24 Additionally, participants with limited health literacy may not have fully understood the questions which could have biased their responses.

The evaluation of AI and its applicability in this study has many limitations. Many of these were related to the dynamic functionality of ChatGPT. While the program’s dynamism is what allows it to learn, it also hampers reproducibility of results. For example, if ChatGPT is prompted to translate a text multiple times, the output varies slightly with each request. Therefore, only the first translation was utilized for analysis purposes. While there are multiple large language models (LLMs) publicly available, including Claude, Gemini, and LlaMA, we elected to use ChatGPT. ChatGPT has become one of the most recognized and utilized generative AI tools in both consumer and academic settings, making it a relevant choice for studying user preferences and content readability in a real-world context. However, this decision may limit the generalizability of our findings across other LLM platforms, as different models may generate varying outputs even when prompted similarly.^10,25 Additionally, ChatGPT often calculates different readability scores each time it is prompted to evaluate a unique text. This limitation might be overcome by teaching the program to calculate scores with a given formula.⁸ Finally, our ChatGPT-generated grade-level specific answer choices did not align with Flesch-Kincade grade level scores. This indicates that our answer choices were above the expected reading level yet followed the correct pattern of increasing reading level difficulty.

Conclusion

While postgraduate-level responses were the most frequently selected overall, preferences varied substantially by education level. Both postgraduate and below-bachelor participants tended to favor simpler content, while only bachelor’s-level respondents consistently preferred more complex language. These findings underscore that simplifying medical information may improve accessibility across diverse educational backgrounds. When tailoring content by education level is not feasible, aiming for an 8th-12th grade reading level may offer the most effective balance between clarity and comprehensiveness in spine care communication. Future research should examine whether aligning educational materials with patient reading preferences leads to improved outcomes such as treatment adherence, comprehension, and satisfaction.

Supplemental Material

Supplemental Material - Assessment of Preferences for Delivery of Spine Care Information: A ChatGPT and Survey-Based Study

Supplemental Material for Assessment of Preferences for Delivery of Spine Care Information: A ChatGPT and Survey-Based Study by Patricia Lipson, Kenneth Nguyen, Aiyush Bansal, Erika Castaneda, Jack Sedwick, and Philip K. Louie in Global Spine Journal

Footnotes

ORCID iDs

Kenneth T. Nguyen

Aiyush Bansal

Philip K. Louie

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Data are available upon request.*

Supplemental Material

Supplemental material for this article is available online.

References

Lans

Bales

Tobert

Rossi

Verlaan

Schwab

. Prevalence of and factors associated with limited health literacy in spine patients. Spine J. 2023;23(3):440-447. doi:10.1016/j.spinee.2022.11.001

Lans

Bales

Borkhetaria

, et al. Impact of health literacy on self-reported health outcomes in spine patients. Spine. 2023;48(7):E87-E93. doi:10.1097/BRS.0000000000004495

Weiss B

. Health Literacy: A Manual for Clinicians. American Medical Association Foundation and American Medical Association; 2003.

Vives

Young

Sabharwal

. Readability of spine-related patient education materials from subspecialty organization and spine practitioner websites. Spine. 2009;34(25):2826-2831. doi:10.1097/BRS.0b013e3181b4bb0c

Nassar

Farias

Ammar

, et al. Bridging health literacy gaps in spine care: using ChatGPT-4o to improve patient-education materials. J Bone Joint Surg Am. 2025;107(18):2131-2140. doi:10.2106/JBJS.24.01484

Sarikonda

Abishek

Isch

, et al. Assessing the clinical appropriateness and practical utility of ChatGPT as an educational resource for patients considering minimally invasive spine surgery. Cureus. 2024;16(10):e71105. doi:10.7759/cureus.71105

Subramanian

Araghi

Amen

, et al. Chat generative pretraining transformer answers patient-focused questions in cervical spine surgery. Clin Spine Surg. 2024;37(6):E278-E281. doi:10.1097/BSD.0000000000001600

Oliva

Pasick

Hoffer

Rosow

. Improving readability and comprehension levels of otolaryngology patient education materials using ChatGPT. Am J Otolaryngol. 2024;45(6):104502. doi:10.1016/j.amjoto.2024.104502

Rachel Hartman

Jaffe

Rosenzweig

Litman

Robinson

. Introducing connect by CloudResearch: advancing online participant recruitment in the digital age. 2023. doi:10.31234/osf.io/ksgyr

10.

Wang

Miller

Schmitt

Wen

. Assessing readability formula differences with written health information materials: application, results, and recommendations. Res Soc Adm Pharm. 2013;9(5):503-516. doi:10.1016/j.sapharm.2012.05.009

11.

Huang

C-Y

Wei

Huang

T-HK

. Generating educational materials with different levels of readability using LLMs. Presented At: Proceedings of the Third Workshop on Intelligent and Interactive Writing Assistants. Honolulu, HI, USA: Association for Computing Machinery; 2024.

12.

Quality AfHRa . Health Literacy Universal Precautions Toolkit. 2nd ed. Rockville, MD: AHRQ.

13.

Cutilli

Bennett

. Understanding the health literacy of America: results of the national assessment of adult literacy. Orthop Nurs. 2009;28(1):27-32. doi:10.1097/01.NOR.0000345852.22122.d6. quiz 33-4.

14.

Kirchner

Kim

Weddle

Bible

. Can artificial intelligence improve the readability of patient education materials? Clin Orthop Relat Res. 2023;481(11):2260-2267. doi:10.1097/CORR.0000000000002668

15.

Nasra

Jaffri

Pavlin-Premrl

, et al. Can artificial intelligence improve patient educational material readability? A systematic review and narrative synthesis. Intern Med J. 2025;55(1):20-34. doi:10.1111/imj.16607

16.

Hung

Chaker

Sigel

Saad

Slater

. Comparison of patient education materials generated by chat generative pre-trained transformer versus experts: an innovative way to increase readability of patient education materials. Ann Plast Surg. 2023;91(4):409-412. doi:10.1097/SAP.0000000000003634

17.

Covington

Watts Alexander

Sewell

, et al. Unlocking the future of patient education: ChatGPT vs. lexiComp(R) as sources of patient education materials. J Am Pharmaceut Assoc. 2003;65(1):102119. doi:10.1016/j.japh.2024.102119. Jan-Feb 2025.

18.

Golden

Harringa

Kliewer

. Readability of lumbar spine MRI reports: will patients understand? AJR Am J Roentgenol. 2019;212(3):602-606. doi:10.2214/AJR.18.20197

19.

Tian Tran

Burghall

Blydt-Hansen

, et al. Exploring the ability of ChatGPT to create quality patient education resources about kidney transplant. Patient Educ Counsel. 2024;129:108400. doi:10.1016/j.pec.2024.108400

20.

Safeer

Keenan

. Health literacy: the gap between physicians and patients. Am Fam Physician. 2005;72(3):463-468.

21.

Badarudeen

Sabharwal

. Assessing readability of patient education materials: current role in orthopaedics. Clin Orthop Relat Res. 2010;468(10):2572-2580. doi:10.1007/s11999-010-1380-y

22.

Bureau USC . Educational attainment in the United States. https://www.census.gov/newsroom/press-releases/2023/educational-attainment-data.html

23.

Chauhan

Linares-Jimenez

Dash

, et al. Unravelling the role of health literacy among individuals with multimorbidity: a systematic review and meta-analysis. BMJ Open. 2024;14(12):e073181. doi:10.1136/bmjopen-2023-073181

24.

Wieczorek

Meier

Vilpert

, et al. Association between multiple chronic conditions and insufficient health literacy: cross-sectional evidence from a population-based sample of older adults living in Switzerland. BMC Public Health. 2023;23(1):253. doi:10.1186/s12889-023-15136-6

25.

Kothari

. ChatGPT, large language models, and generative AI as future augments of surgical cancer care. Ann Surg Oncol. 2023;30(6):3174-3176. doi:10.1245/s10434-023-13442-2