Sage Journals: Discover world-class research

Abstract

Background

ChatGPT and other AI-driven language models are increasingly used in healthcare for disseminating medical information. However, their performance in providing accurate and empathetic responses to patients with specific diseases remains uncertain.

Objective

This study aimed to evaluate the effectiveness and reliability of ChatGPT in providing esophageal cancer-related information using the SERVQUAL framework, focusing on five dimensions: Tangibles, Reliability, Responsiveness, Assurance, and Empathy.

Methods

Ten representative questions on esophageal cancer were selected through search engine analysis and specialist consultation. ChatGPT generated responses, which were evaluated by 48 gastroenterologists using a 5-point Likert scale aligned with the SERVQUAL model. Statistical analysis was conducted using R 4.4.0 to compare responses between specialist and non-specialist physicians.

Results

ChatGPT performed well in providing structured, logical, and generally informative responses, particularly in the prevention domain. However, limitations were identified in its responsiveness and empathy. Significant differences were observed between specialists and non-specialists in evaluating certain answers, especially regarding reliability and cutting-edge knowledge. ChatGPT often failed to reflect the latest clinical guidelines or regional risk-specific recommendations.

Conclusion

While ChatGPT shows potential in patient education for esophageal cancer, its current outputs lack clinical specificity and up-to-date medical insight. AI tools should be continuously improved with dynamic data integration and specialist supervision to ensure reliability and relevance in real-world healthcare scenarios.

Keywords

ChatGPT esophageal cancer artificial intelligence SERVQUAL model medical information patient education

Introduction

The integration of artificial intelligence (AI) into healthcare has presented exciting opportunities for improving patient care and medical communication. One such advancement is the Chat Generative Pre-trained Transformer (ChatGPT), an AI-based conversational chatbot system developed by Open AI.^1,2 With its ability to generate responses based on large language models, it has greatly facilitated the process of public access to medical knowledge.^3–6 Consequently, numerous studies have been initiated to assess the availability and accuracy of medical information provided by it to the public.^7–11 Despite the subsequent introduction of numerous other large language models, including Google Gemini, Bard, Deepseek and so on, ChatGPT as a pioneering invention, remains the most critical.^12–14

However, the reliability of AI-generated content is particularly critical for diseases with high mortality and complex management pathways, such as esophageal cancer. Ranked as the sixth leading cause of global cancer deaths, this malignancy carries a 5-year survival rate of only 15–25%,^15–18 and exhibits distinct risk factor profiles between histological subtypes that influence global incidence patterns.^19–21 The urgent need for accurate, accessible information on prevention, early detection, and treatment underscores the importance of evaluating ChatGPT's performance in this high-stakes context. SERVQUAL model is used to assess the quality of healthcare services,^22–25 with a particular emphasis on patient perceptions, encompassing five dimensions: Tangibles, Reliability, Responsiveness, Assurance, and Empathy.^26–28 In particular, Eun Kyoung Yun et al. have highlighted the important role of the SERVQUAL model in the management of telemedicine services.²⁹ Joongwon Choi et.al have utilized the SERVQUAL model to assess the availability of ChatGPT in providing medical information related to kidney cancer.³⁰

Currently, there is no research evaluating the use of ChatGPT for providing esophageal cancer information. Through a multidimensional approach encompassing literature review, Q&A comparison and specialist evaluation, our research aims to evaluate the efficacy and reliability of ChatGPT-3.5 generated outputs in addressing the informational needs of the public and esophageal cancer patients.

Study objective

This study evaluates the effectiveness and reliability of ChatGPT in providing medical information on esophageal cancer using the SERVQUAL framework. By leveraging AI technology, we seek to enhance patient communication, support informed decision-making, and ultimately improve the quality of care for individuals affected by this disease.

Method

Questions selection and answers generation

This study utilized the large language model based on the ChatGPT-3.5 architecture developed by OpenAI in San Francisco, California, USA (last updated in September 2021). The search was conducted between 1 March and 10 March 2024, from China, Nanchang locale, using a desktop device (Windows 10, Chrome browser, logged-out mode) to avoid personalization bias. First, based on the global search engine market share data from StatCounter,³¹ we selected five widely used platforms “Google, Bing, Yahoo!, Yandex, and Baidu” as data sources. For each engine, the query term “esophageal cancer” was entered, and the top 20 patient-facing questions were extracted from the “People also ask” and “related searches” sections. Extraction was performed manually by two independent researchers, with discrepancies resolved by consensus. In parallel, we conducted structured interviews with five frontline gastroenterologists (three from academic tertiary hospitals, two from regional medical centers). Each clinician provided the 10 most frequently asked patient questions regarding esophageal cancer in routine outpatient consultations. The search-derived and clinician-reported lists were merged, deduplicated, and categorized into seven domains (prevention, diagnosis, early symptoms, progression, prognosis, treatment, heredity). Two senior gastroenterologists independently reviewed the pool and selected 10 representative questions that were both clinically meaningful and frequently encountered. The questions of selecting input chat to generate pre-trained transformers are shown in Table 1 (ChatGPT input). The finalized 10 questions were submitted to ChatGPT-3.5 in separate, independent sessions (new chat window each time, identical prompt format). Model outputs were not further edited except for formatting. The complete set of generated answers is provided in Supplemental materials.

Table 1.

Question list.

Question number	Asked question
1	Are there any measures to prevent and avoid esophageal cancer?
2	Are there any methods to detect esophageal cancer early?
3	In which part does esophageal cancer often affect?
4	What are the early symptoms of esophageal cancer?
5	What causes esophageal cancer?
6	How should I manage my esophageal cancer?
7	Can esophageal cancer be inherited?
8	How long does esophageal cancer develop from early stage to late stage?
9	Is there a definite treatment for esophageal cancer?
10	How long can esophageal cancer live?

Questionnaire design

We constructed a structured evaluation instrument based on the SERVQUAL framework. Each ChatGPT answer was rated by clinical experts using a five-point Likert scale (1 = strongly disagree, 5 = strongly agree) across the following adapted dimensions, shown in Table 2.

Table 2.

The questions designed according to SERVQUAL model.

Characteristics of SERVQUAL model	Corresponding evaluation
Tangibles	I think this answer has good structure and strong logic
Reliability	I think this answer has good accuracy and reliability
Responsiveness	I think this answer is cutting-edge and reflects the latest guidelines
Assurance	I think the public can trust and use that answer
Empathy	I think the answer is very appropriate, can accurately understand the problem and to consider the meaning of the public

To minimize bias, raters were blinded to the identity of the AI system and the study hypothesis. The order of questions was randomized for each rater using a computer-generated sequence. Detailed rating instructions and anchors were provided, and the full evaluation instrument is included in Supplemental materials. The mapping of SERVQUAL dimensions to textual answer properties was guided by prior applications of SERVQUAL in digital health information quality assessment³⁰ and refined through pilot testing with two gastroenterologists, who confirmed clarity and domain relevance. Minor revisions were made based on their feedback before full deployment.

Participant and recruitment

In May 2024, a total of 50 gastroenterologists were invited to participate in a questionnaire survey through the online platform Sojump. Eligibility for participation was based on the inclusion criteria for holding a medical practitioner qualification, being currently employed in the field of gastroenterology, and agreeing to take part in the study by completing the questionnaire. The specialist group was defined as physicians who manage more than 10 newly diagnosed esophageal-cancer patients per month (both squamous cell carcinoma and adenocarcinoma); those with ≤10 cases per month formed the non- specialist group. The exclusion criteria ruled out non-gastroenterological practitioners and deemed any questionnaires completed in less than 60 s as invalid. Additionally, participants had the option to withdraw voluntarily from the study, or they would be excluded if they were unable to complete the questionnaire for any reason.

Statistical analysis

All statistical analyses were performed using R version 4.4.0. The Likert-scale data derived from SERVQUAL evaluations were treated as ordinal variables. As the assumptions for parametric tests were not considered appropriate for this data type, the non-parametric Wilcoxon rank-sum test (Mann–Whitney U test) was employed to compare responses between specialist and non-specialist groups. Effect sizes were quantified using Cliff's Delta (δ) along with their 95% confidence intervals (CIs). The magnitude of effect sizes was interpreted as follows: |δ| < 0.147 (negligible), 0.147–0.33 (small), 0.33–0.474 (medium), and >0.474 (large).

Positive evaluation rate (PER) was calculated as the percentage of responses rated 4 (Agree) or 5 (Strongly Agree) on the 5-point Likert scale. This metric was used to assess the proportion of gastroenterologists who provided favorable evaluations of ChatGPT's responses.

Inter-rater reliability among the 48 gastroenterologists was assessed using the intraclass correlation coefficient (ICC) based on a two-way random-effects model for absolute agreement. ICC values were interpreted according to conventional guidelines: < 0.50 (poor), 0.50–0.75 (moderate), 0.75–0.90 (good), and >0.90 (excellent) (Figure 1).

Figure 1.

Flowchart of the whole process.

Results

Of the 50 questionnaires distributed, one invalid response, completed in less than 1 min, was discarded, leading to a final collection of 48 questionnaires. The recovery rate of the questionnaires was thus 96%. The general information distribution is given in Table 3.

Table 3.

General information distribution.

Variables	Amount	Proportion
Gender
Male	35	72.92%
Female	13	27.08%
Age
20–40	19	39.59%
41–50	19	39.58%
Above 50	10	20.83%
Types of medical institutions
University or tertiary hospital	44	91.67%
Secondary hospital	4	8.33%
Others	0	0.00%
Average esophageal cancer patients per month
<10	41	85.42%
10–20	6	12.5%
>20	1	2.08%

Among participants, 72.92% were male and 27.08% female. The age groups with the highest proportions were 20–40 years old (39.59%) and 40–50 years old (39.58%), while the proportions for over 50 years old were 20.83%, respectively. The majority of participating medical institutions were university or tertiary hospitals (91.67%), while secondary hospitals accounted for 8.33%. Most doctors saw an average of less than 10 esophageal cancer patients per month, accounting for 85.42%. A few doctors saw 10–20 patients with esophageal cancer per month (12.5%), and 2.08% saw more than 20 patients. To determine if there were significant differences in responses, we compared physicians seeing more than 10 esophageal cancer patients per month to those seeing fewer than 10. More than 10 patients per month were considered more experienced specialists, and those less than 10 were considered generalists. The results indicated that 40% of the questions (questions 1, 2, 3, and 4) showed significant differences in reliability and comprehensiveness. Consequently, we concluded that physicians seeing more than 10 esophageal cancer patients per month demonstrated greater clinical knowledge of esophageal cancer. Cronbach's alpha reliability analysis of the questionnaire indicated an overall alpha value of 0.99, suggesting high reliability and strong consistency among the question items.

Detailed responses and scores for all questions can be found in the Supplemental material. The corresponding average scores and PER for each question across the five SERVQUAL dimensions are summarized in Figure 2. The overall PER for the 10 questions ranged from 70.83% to 93.75%, with average scores ranging from 3.9 to 4.43. Question 7, regarding the hereditary nature of esophageal cancer, received the highest overall PER (93.75%) and score (4.43) among all questions, with particularly strong performance in Tangibles, Reliability, Assurance, and Empathy dimensions (all 95.83%). In contrast, question 4, concerning the early symptoms of esophageal cancer, had the lowest overall PER (70.83%) and score (3.9), with particularly low ratings in Responsiveness (66.67%) and Reliability (70.83%). Notably, questions 5, 7, and 10 demonstrated consistently high PER across all dimensions (>85%), while question 4 showed the weakest performance. The Responsiveness dimension generally received lower PER compared to other dimensions across most questions, suggesting limitations in ChatGPT's ability to provide cutting-edge information. Seven out of ten questions received an overall PER exceeding 85%, indicating generally favorable evaluations by gastroenterologists.

Figure 2.

Evaluation of ChatGPT-generated esophageal cancer information using SERVQUAL framework (n = 48). (A) Heatmap of average scores across five dimensions. (B) Positive evaluation rates (percentage of ratings ≥4) by question and dimension.

Additionally, Table 4 summarizes whether there were significant differences between the responses of gastroenterologists with more than 10 monthly patients and those with fewer than 10. The inter-rater reliability for the overall evaluations, as measured by ICC, was 0.988 (95% CI, 0.982, 0.992), indicating excellent agreement among the gastroenterologists.

Table 4.

Wilcoxon rank-sum test results comparing SERVQUAL dimension ratings between specialists and non-specialists.

Question	Dimension	W	P-value	Cliff delta	95% CI
Q1	Tangibles	162.5	0.53	0.13	(−0.23, 0.47)
Q1	Reliability	184.5	0.17	0.29	(−0.03, 0.55)
Q1	Responsiveness	174	0.33	0.21	(−0.19, 0.55)
Q1	Assurance	173	0.34	0.21	(−0.17, 0.53)
Q1	Empathy	180	0.24	0.25	(−0.13, 0.57)
Q2	Tangibles	213.5	0.02	0.49	(0.31, 0.63)
Q2	Reliability	217	0.02	0.51	(0.34, 0.65)
Q2	Responsiveness	210.5	0.04	0.47	(0.18, 0.68)
Q2	Assurance	187.5	0.17	0.31	(−0.06, 0.60)
Q2	Empathy	188.5	0.16	0.31	(−0.04, 0.60)
Q3	Tangibles	202.5	0.06	0.41	(0.08, 0.66)
Q3	Reliability	206.5	0.04	0.44	(0.11, 0.68)
Q3	Responsiveness	208	0.04	0.45	(0.14, 0.68)
Q3	Assurance	179	0.26	0.25	(−0.20, 0.61)
Q3	Empathy	183	0.22	0.28	(−0.18, 0.63)
Q4	Tangibles	213	0.03	0.48	(0.19, 0.70)
Q4	Reliability	234.5	0.00	0.63	(0.46, 0.76)
Q4	Responsiveness	200.5	0.08	0.40	(0.00, 0.68)
Q4	Assurance	197	0.10	0.37	(−0.05, 0.68)
Q4	Empathy	183.5	0.23	0.28	(−0.13, 0.61)
Q5	Tangibles	187	0.14	0.30	(−0.03, 0.58)
Q5	Reliability	190.5	0.12	0.33	(−0.01, 0.60)
Q5	Responsiveness	187	0.16	0.30	(−0.11, 0.63)
Q5	Assurance	174.5	0.31	0.22	(−0.18, 0.55)
Q5	Empathy	163.5	0.52	0.14	(−0.30, 0.53)
Q6	Tangibles	189	0.13	0.32	(0.01, 0.57)
Q6	Reliability	197	0.08	0.37	(0.07, 0.62)
Q6	Responsiveness	198	0.08	0.38	(0.08, 0.62)
Q6	Assurance	179.5	0.25	0.25	(−0.11, 0.56)
Q6	Empathy	179.5	0.25	0.25	(−0.11, 0.56)
Q7	Tangibles	183.5	0.17	0.28	(−0.05, 0.55)
Q7	Reliability	171	0.37	0.19	(−0.20, 0.53)
Q7	Responsiveness	179.5	0.25	0.25	(−0.11, 0.56)
Q7	Assurance	171	0.37	0.19	(−0.20, 0.53)
Q7	Empathy	171	0.37	0.19	(−0.20, 0.53)
Q8	Tangibles	181	0.23	0.26	(−0.12, 0.57)
Q8	Reliability	184.5	0.19	0.29	(−0.10, 0.59)
Q8	Responsiveness	193	0.12	0.34	(−0.01, 0.62)
Q8	Assurance	188	0.16	0.31	(−0.08, 0.61)
Q8	Empathy	188	0.16	0.31	(−0.08, 0.61)
Q9	Tangibles	176.5	0.29	0.23	(−0.15, 0.55)
Q9	Reliability	184.5	0.19	0.29	(−0.10, 0.59)
Q9	Responsiveness	194	0.12	0.35	(0.00, 0.62)
Q9	Assurance	186	0.17	0.30	(−0.10, 0.61)
Q9	Empathy	186	0.17	0.30	(−0.10, 0.61)
Q10	Tangibles	179	0.25	0.25	(−0.14, 0.57)
Q10	Reliability	186	0.17	0.30	(−0.10, 0.61)
Q10	Responsiveness	196	0.10	0.37	(−0.02, 0.66)
Q10	Assurance	170.5	0.38	0.19	(−0.22, 0.54)
Q10	Empathy	167	0.45	0.16	(−0.24, 0.52)

Discussion

Strengths, gaps, and risks in ChatGPT's medical responses

In this study, two specialists separately evaluated the comprehensiveness of each ChatGPT answer. The comprehensiveness of responses to 10 questions was recognized, providing valuable information for patients to refer. However, the answers also exhibited notable incompleteness, particularly the absence of cutting-edge therapeutic strategies, updated scientific results, and prognostic data. Furthermore, the specialists acknowledged the structured and logical nature of the answers but expressed skepticism regarding their specificity. They noted that ChatGPT did not provide examples or detailed explanations to help the public assess the applicability of the opinions.

We also surveyed gastroenterologists who evaluated ChatGPT responses to questions about esophageal cancer, with overall positive ratings for questions like “Can esophageal cancer be inherited?” However, the ChatGPT responses scored relatively low in terms of reliability and responsiveness to the latest insights, which indicates ChatGPT's responses lacked incorporation of emerging therapeutic modalities. The omission of such advancements may mislead patients regarding treatment options. This gap underscores ChatGPT's dependency on pre-2021 data, limiting its utility for real-time clinical decision support.^32–34 Future AI iterations must integrate dynamic medical database updates to address this critical shortfall. Interestingly, a deeper analysis revealed statistically significant differences in specialists and non-specialists. Specialists consistently assigned lower ratings in the dimensions of reliability and responsiveness, comparing to non-specialists. These differences underscore the value of domain expertise in critically evaluating AI-generated content, especially in high-stakes contexts such as oncology.

In addition, Q4 (“What are the early symptoms of esophageal cancer?”) received relatively low scores because ChatGPT emphasized only general symptoms and overlooked the fact that early manifestations are often subtle and nonspecific. Such omissions may delay timely health-seeking, especially among populations with low health literacy. Enhancing symptom prioritization and linking vague complaints to actionable next steps will be crucial for future improvements.

ChatGPT is considered to serve as a useful source of information for both patients and healthcare professionals.³⁵ However, ChatGPT relies on large-scale data collection and subsequent training. Due to the limitations of its training data, ChatGPT may face challenges in providing accurate and in-depth specialized medical knowledge. Study shows that ChatGPT performed poorly in the diagnosis and treatment categories, with treatment being the only category where it was entirely incorrect.³⁶ Consequently, ChatGPT cannot replace the comprehensive diagnosis and treatment provided by medical professionals.

It also cannot be held accountable for answers generated through ongoing training, which face a number of data security and privacy risks.^37,38 There is a potential risk of medical disputes, particularly when incorrect information is provided, as it can be challenging to determine accountability for erroneous responses.^39,40 As an auxiliary diagnostic and treatment tool, there is concern that discrepancies between ChatGPT's information and that of medical specialists could exacerbate doctor–patient conflicts.^41,42 Furthermore, ChatGPT is not capable of offering adaptive advice tailored to individual patients from different backgrounds. Harry Collin noted that ChatGPT may not fully capture the breadth or depth of understanding from patients with varying educational backgrounds and mental states.⁴³ Answers based on a college education level may present comprehension challenges for individuals with lower educational attainment.^44–47

Study limitations and methodological considerations

Our study has several limitations that should be acknowledged. First, regarding sample composition, only 12.5% of respondents were specialists who saw more than 10 esophageal cancer patients per month, and 72.92% of participants were male. According to the Age-Standardized Incidence Rate per World Standard Population (ASIRW), the incidence of esophageal cancer is 11.13 per 100,000, reflecting its relatively low prevalence in the general population.⁴⁸ Within the spectrum of digestive system tumors, esophageal cancer occurs less frequently than colorectal, gastric, and liver cancers.⁴⁹ As a result, many physicians may encounter fewer than ten esophageal cancer patients monthly, potentially limiting the representativeness of our findings. Future studies should therefore include a higher proportion of subspecialists with extensive case exposure. Second, the study population was heavily skewed toward Chinese physicians, most of whom were based in universities or tertiary hospitals. This concentration may limit the applicability of results to other regions, community hospitals, and international settings. Because esophageal cancer incidence and clinical practices differ in regions, future research should strive to include clinicians from high-incidence countries and diverse healthcare contexts to improve external validity. Third, the evaluators in our study were physicians rather than patients. While physicians provide valuable expert perspectives on accuracy and reliability, this design may not fully capture the patient's perspective regarding clarity, empathy, and practical applicability. Future investigations should integrate patient feedback to more comprehensively assess the usability of AI-generated content in real-world contexts. Finally, although the adapted SERVQUAL scale showed excellent internal consistency, it has not undergone formal psychometric validation. Future studies should conduct confirmatory factor analysis and reliability testing to establish its validity within the context of AI-generated medical content. Moreover, certain SERVQUAL dimensions may not fully align with digital health evaluations. For instance, the “Tangibles” dimension, originally intended to assess physical facilities, was reinterpreted in this study as the logic and structure of AI-generated responses. While this adaptation enabled consistent scoring, it may not fully capture user experience with AI content, thereby constraining the generalizability of our results.

Challenges in applying AI to medical contexts and future directions

ChatGPT is still in its early stages in the medical field. Due to the lack of its most recent advancements, diagnostic and therapeutic advice requires further refinement before recommending it as a widely used linguistic model for the diagnosis and treatment. And low empathy scores reflect ChatGPT's inability to tailor its responses to the patient's specific context. While its responses are structurally coherent, they fail to accommodate, for example, geographic risks. This highlights that AI lacks human-like contextual interpretation, which remains crucial for patient-centered communication. Future developments should incorporate patient-specific adaptive frameworks, possibly through integrated demographic filters. On top of that, to improve reliability, AI tools like ChatGPT should be integrated with updated medical databases and supervised by healthcare professionals. Future models must address shortcomings in responsiveness and contextual accuracy before clinical implementation. It is also crucial to emphasize the significant role of physicians and specialists in patient diagnosis and prognosis.

And for the SERVQUAL model, future studies should perform psychometric evaluations such as Cronbach's alpha for internal consistency, and factor analyses to confirm the structure of the adapted instrument in the context of AI-based medical service evaluation. Moreover, the model has ethical and legal liability gaps that do not address AI accountability mechanisms and data privacy risks. Future studies should integrate AI-specific metrics with SERVQUAL and introduce patient perspective assessment.

In the future, we will keep up with the times by tracking the stability of answers to the same question across GPT versions and update cycles and integrating a continuous learning framework to enable the model to update medical guidelines in real time and label knowledge currency. We can develop more sensitive models to push precise recommendations, such as reducing hot food intake and mold contamination, for users in areas with a high prevalence of esophageal cancer. At the patient level, we will develop a trustworthiness score visualization component on the patient interaction page, which will indicate the reliability of the answer with 5 stars to avoid over-reliance by patients.

Conclusion

This study evaluated ChatGPT's ability to deliver esophageal cancer information using specialist assessments within the SERVQUAL framework. While ChatGPT generated coherent and logically structured responses, it showed clear shortcomings in accuracy, medical specificity, and contextual adaptability, particularly regarding up-to-date clinical knowledge and individualized guidance. Specialists with extensive clinical experience rated its performance more critically, underscoring the limitations of static AI models in complex medical contexts. Future improvements should focus on enhancing reliability through integration with updated clinical guidelines, incorporating geo-sensitive and demographically adaptive modules, and developing real-time learning mechanisms. Attention must also be given to ethical and legal safeguards, including transparent accountability systems, to address liability in medical decision-making. Ultimately, AI tools like ChatGPT should complement rather than replace physician expertise. Positioned as supervised educational adjuncts, they may help improve patient communication and health literacy. Further research comparing multiple large language models and including patient perspectives will be crucial for ensuring their safe and effective deployment in healthcare.

Supplemental Material

sj-doc-2-dhj-10.1177_20552076251393291 - Supplemental material for Effectiveness of ChatGPT to provide esophageal cancer information: A SERVQUAL-based analysis

Supplemental material, sj-doc-2-dhj-10.1177_20552076251393291 for Effectiveness of ChatGPT to provide esophageal cancer information: A SERVQUAL-based analysis by Wangxinjun Cheng, Yichen Liu, Chufan Zhou and Chuan Xie in DIGITAL HEALTH

Supplemental Material

sj-docx-3-dhj-10.1177_20552076251393291 - Supplemental material for Effectiveness of ChatGPT to provide esophageal cancer information: A SERVQUAL-based analysis

Supplemental material, sj-docx-3-dhj-10.1177_20552076251393291 for Effectiveness of ChatGPT to provide esophageal cancer information: A SERVQUAL-based analysis by Wangxinjun Cheng, Yichen Liu, Chufan Zhou and Chuan Xie in DIGITAL HEALTH

Footnotes

ORCID iDs

Wangxinjun Cheng

Yichen Liu

Chufan Zhou

Chuan Xie

Ethics approval and consent to participate

This study was reviewed and approved by the Institutional Ethics Committee of the First Affiliated Hospital of Nanchang University (Approval No. (2025) CDYFYYLK (07-020)). All procedures performed in this study involving human participants were conducted in accordance with the ethical standards of the Institutional Ethics Committee and with the 1964 Helsinki Declaration and its later amendments. However, all participants were presented with a detailed information sheet on the first page of the online survey, which outlined the study's purpose, the voluntary nature of participation, data anonymity, and their right to withdraw at any time. Proceeding to complete and submit the questionnaire was considered implied consent.

Author contributions

Wangxinjun Cheng and Yichen Liu were responsible for the writing of the paper. Chufan Zhou was responsible for the data analysis. Chuan Xie was responsible for the guidance and proofreading.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by Talent Team Plan-Ganpo Talent Support Plan-Major Academic and Technical Leader Training Project-Leading Talents (Academic) (20243BCE51001), Ganpo Talent Program (gpyc20240212), Natural Science Foundation of Jiangxi Province (20242BAB26122).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

The data that support the findings of this study are not publicly available due to some privacy reasons but are available from the corresponding author upon request.

Generative artificial intelligence (AI)

The AI Usage Statement for this study clarifies that AI technologies were utilized to assist in data analysis and interpretation, with human oversight ensuring the integrity and accuracy of the research findings.

Supplemental material

Supplemental material for this article is available online.

References

ChatGPT . Available: https://openai.com/chatgpt/overview/

OpenAI . [cited 2 Apr 2025]. Available: https://openai.com/

Clusmann

Kolbinger

Muti

, et al. The future landscape of large language models in medicine. Commun Med (Lond) 2023; 3: 41.

Hopkins

Logan

Kichenadasse

, et al. Artificial intelligence chatbots will revolutionize how cancer patients access information: chatGPT represents a paradigm-shift. JNCI Cancer Spectr 2023; 7: pkad010.

Matulis

McCoy

. Relief in sight? Chatbots, in-baskets, and the overwhelmed primary care clinician. J Gen Intern Med 2023; 38: 2808–2815.

Wang

Sanders

Liu

, et al. ChatGPT: promise and challenges for deployment in low- and middle-income countries. Lancet Reg Health West Pac 2023; 41: 100905.

Khromchenko

Shaikh

Singh

, et al.

ChatGPT-3.5 versus google bard: which large language model responds best to commonly asked pregnancy questions?

Cureus 2024; 16: e65543.

Ventresca

Davis

Gauthier

, et al. ChatGPT-4 effectively responds to common patient questions on total ankle arthroplasty: a surgeon-based assessment of AI in patient education. Foot Ankle Orthop 2025; 10: 24730114251322784.

Oliveira

Coelho

Guedes

, et al. Performance of ChatGPT 3.5 and 4 as a tool for patient support before and after DBS surgery for Parkinson’s disease. Neurol Sci 2024; 45: 5757–5764.

10.

Taymour

Fouda

Abdelrahaman

, et al. Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries. J Prosthet Dent 2025; S0022-3913: 00833–3.

11.

Ichhpujani

Parmar

UPS

Kumar

. Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma. Rom J Ophthalmol 2024; 68: 243–248.

12.

Moulaei

Yadegari

Baharestani

, et al. Generative artificial intelligence in healthcare: a scoping review on benefits, challenges and applications. Int J Med Inform 2024; 188: 105474.

13.

Tian

Ayers

, et al. Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review. BMC Med Inform Decis Mak 2024; 24: 57.

14.

Taratkin

M S

Shchelkunova

Azilgareeva

, et al. Artificial intelligence and large language models: challenges and prospects in research and medicine. Urologiia 2024; (2): 122–127.

15.

Kato

Nakajima

. Treatments for esophageal cancer: a review. Gen Thorac Cardiovasc Surg 2013; 61: 330–335.

16.

Watanabe

Otake

Kozuki

, et al. Recent progress in multidisciplinary treatment for patients with esophageal cancer. Surg Today 2020; 50: 12–20.

17.

Domper Arnal

Ferrández Arenas

Lanas Arbeloa

. Esophageal cancer: risk factors, screening and endoscopic treatment in Western and Eastern countries. World J Gastroenterol 2015; 21: 7933–7943.

18.

Huang

F-L

S-J

. Esophageal cancer: risk factors, genetic association, and treatment. Asian J Surg 2018; 41: 210–215.

19.

Zhu

, et al. Esophageal cancer in China: practice and research in the new era. Int J Cancer 2023; 152: 1741–1751.

20.

Jajosky

Elliott

DRF

. Esophageal cancer genetics and clinical translation. Thorac Surg Clin 2022; 32: 425–435.

21.

Uhlenhopp

Then

Sunkara

, et al. Epidemiology of esophageal cancer: update in global trends, etiology and risk factors. Clin J Gastroenterol 2020; 13: 1010–1021.

22.

The Most Important Telemedicine Patient Satisfaction Dimension Patient-Centered Care.

23.

Mason

Brown

Mason

. Telemedicine patient satisfaction dimensions moderated by patient demographics. Healthcare 2022; 10: 1029.

24.

Jin

Yuan

Chang

, et al. Telemedicine in China: effective indicators of telemedicine platforms for promoting health and well-being among healthcare consumers. Digital Health 2025; 11: 20552076251341163.

25.

Althumairi

AlHabib

Alumran

, et al. Healthcare providers’ satisfaction with implementation of telemedicine in ambulatory care during COVID-19. Healthcare 2022; 10: 1169.

26.

Jonkisz

Karniej

Krasowska

. The servqual method as an assessment tool of the quality of medical services in selected Asian countries. Int J Environ Res Public Health 2022; 19: 7831.

27.

Bobocea

Gheorghe

Spiridon

, et al. The management of health care service quality. A physician perspective. J Med Life 2016; 9: 149–152.

28.

Fatima

Humayun

Iqbal

, et al. Dimensions of service quality in healthcare: a systematic review of literature. Int J Qual Health Care 2019; 31: 11–29.

29.

Yun

Chun

. Critical to quality in telemedicine service management: application of DFSS (design for six sigma) and SERVQUAL). Nurs Econ 2008; 26: 384–388.

30.

Choi

Kim

Lee

, et al. Availability of ChatGPT to provide medical information for patients with kidney cancer. Sci Rep 2024; 14 (1): 1542. DOI: 10.1038/s41598-024-51531-8

31.

Statcounter Global Stats - Browser, OS, Search Engine including Mobile Usage Share. In: StatCounter Global Stats [Internet]. [cited 31 July 2025]. Available: https://gs.statcounter.com/

32.

Haver

Ambinder

Bahl

, et al. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 2023; 307: e230424.

33.

Lai

Liao

Zhao

, et al. Exploring the capacities of ChatGPT : a comprehensive evaluation of its accuracy and repeatability in addressing helicobacter pylori -related queries. Helicobacter 2024; 29: e13078.

34.

Johnson

King

Warner

, et al. Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information. JNCI Cancer Spectrum 2023; 7: pkad015.

35.

Kuşcu

Pamuk

Sütay Süslü

, et al.

Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer?

Front Oncol 2023; 13: 1256459.

36.

Hermann

Patel

Boyd

, et al. Let’s chat about cervical cancer: assessing the accuracy of ChatGPT responses to cervical cancer questions. Gynecol Oncol 2023; 179: 164–168.

37.

Levartovsky

Ben-Horin

Kopylov

, et al. Towards AI-augmented clinical decision-making: an examination of ChatGPT’s utility in acute ulcerative colitis presentations. Am J Gastroenterol 2023; 118: 2283–2289.

38.

Dave

Athaluri

Singh

. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 2023; 6: 1169595.

39.

Wang

Liu

Yang

, et al. Ethical considerations of using ChatGPT in health care. J Med Internet Res 2023; 25: e48009.

40.

Barnhart

JEM

Dierickx

. Why ChatGPT means communication ethics problems for bioethics. Am J Bioeth 2023; 23: 80–82.

41.

Zhang

Zhao

Zhang

, et al. Application of large language models in healthcare: a bibliometric analysis. Digit Health 2025; 11: 20552076251324444.

42.

Aydin

Karabacak

Vlachos

, et al. Large language models in patient education: a scoping review of applications in medicine. Front Med (Lausanne) 2024; 11: 1477898.

43.

Collin

Keogh

Basto

, et al. ChatGPT can help guide and empower patients after prostate cancer diagnosis. Prostate Cancer Prostatic Dis 2025; 28(2): 513–515.

44.

Pradhan

Fiedler

Samson

, et al. Artificial intelligence compared with human-derived patient educational materials on cirrhosis. Hepatol Commun 2024; 8: e0367.

45.

Kooraki

Hosseiny

Jalili

, et al. Evaluation of ChatGPT-generated educational patient pamphlets for common interventional radiology procedures. Acad Radiol 2024; 31: 4548–4553.

46.

Wei

Yao

Cui

, et al. Evaluation of ChatGPT-generated medical responses: a systematic review and meta-analysis. J Biomed Inform 2024; 151: 104620.

47.

Sridharan

Sivaramakrishnan

. Investigating the capabilities of advanced large language models in generating patient instructions and patient educational material. Eur J Hosp Pharm 2025; 32(6): 501–507.

48.

Smyth

Lagergren

Fitzgerald

, et al. Oesophageal cancer. Nat Rev Dis Primers 2017; 3: 1–21.

49.

Zhou

Song

Chen

, et al. Burden of six major types of digestive system cancers globally and in China. Chin Med J 2024; 137: 1957.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.09 MB

0.04 MB