Sage Journals: Discover world-class research

Abstract

Objective: This study aimed to investigate the accuracy, reliability, and readability of A-Eye Consult, ChatGPT-4.0, Google Gemini and Copilot AI large language models (LLMs) in responding to patient questions about endophthalmitis. Methods: The LLMs’ responses to 25 questions about endophthalmitis, frequently asked by patients, were evaluated by two ophthalmologists using a five-point Likert scale, with scores ranging from 1–5. The DISCERN scale assessed the reliability of the LLMs’ responses, whereas the Flesch Reading Ease (FRE) and Flesch–Kincaid Grade Level (FKGL) indices assessed readability and text complexity, respectively. Results: A-Eye Consult and ChatGPT-4.0 outperformed Google Gemini and Copilot in providing comprehensive and precise responses. The Likert score significantly differed across all four LLMs (p < .001), with A-Eye Consult scoring significantly higher than Google Gemini and Copilot (p < .001). Conclusions: A-Eye Consult and ChatGPT-4.0 responses, while more complex than those of other LLMs, provided more reliable and accurate information.

Keywords

a-eye consult artificial intelligence chatgpt-4.0 copilot endophthalmitis google gemini

Introduction

One of the most prevalent disorders in ophthalmology is eye infections. These infections affect either the outer or inner components of the eye.¹ Endophthalmitis is the term used to describe inflammation of the vitreous and/or aqueous humor, usually caused by infections or other external factors. The sources of the condition include surgical procedures, injections, blisters, physical injury, and inflammation of the cornea.^2–5

Endophthalmitis is considered an ophthalmologic emergency. Hence, its diagnosis and treatment should be conducted immediately after any evidence of its emergence.⁶ The prevalence of this disorder has increased with the introduction of cataract surgery, therefore, acute postoperative endophthalmitis is currently the most frequent cause.⁷ Although endophthalmitis has become common in the context of cataract extraction, gradual increases in its incidence rate have been reported in association with other variables, particularly with the increased use of antivascular endothelial growth factor (anti-VEGF) injections.^4,8

Large language models (LLMs) are increasingly utilised in patient education and medical decision-making, with the potential to significantly impact ophthalmology by acting as a catalyst for medical artificial intelligence.^9–12 Numerous ophthalmology studies have assessed how well LLMs respond to patient inquiries and provide medically accurate information to eye specialists regarding a range of ocular conditions, including cataracts, retinal diseases, and glaucoma.^12–14 LLMs such as ChatGPT are increasingly being used by patients who want to learn about their eye health through the Internet.¹⁵

Ophthalmology will greatly benefit from integrating LLMs into current clinical treatment, considering the abundance of digital data from electronic records, imaging databases, and electronic communication with patients. Although LLM technology is in its early stages, its potential for major advancements and numerous applications for patients and physicians, could significantly revolutionize healthcare delivery.^16,17

Endophthalmitis is an ocular emergency and its incidence is increasing due to the increasing frequency of cataract surgery and intraocular injections. It is a rapidly progressive, life-threatening infection for which early diagnosis and treatment are crucial for optimal care and the prevention of disease progression. As such, it differs from other eye infections. In conclusion, artificial intelligence systems in large-language chatbots have the potential to help improve patient outcomes by reducing the number of missed or delayed diagnoses.¹⁸ Therefore, further research is required to determine whether patients receive accurate information from artificial intelligence chatbots like A-Eye Consult, ChatGPT, Copilot, and Google Gemini. These four chatbots were selected for the study due to their popularity and are frequent comparison in research on eye diseases.^19,20

To the best of our knowledge, no study in the literature has investigated the effectiveness of chatbots for endophthalmitis. This study aimed to compare the effectiveness, accuracy, readability, and advantages and disadvantages of responses from the four chatbots to 25 frequently asked questions about endophthalmitis.

Materials and methods

The study included 25 questions about endophthalmitis asked by Internet users while using Google (Alphabet Inc.), comprising frequently asked questions doctors receive regarding their patients’ diagnoses and disease treatment. The 25 questions were selected by an ophthalmologist (SD), based on previous studies on the accuracy and readability of chatbots, without a specific rationale for choosing 25 questions.^14,19 These questions and the answers provided by the four chatbots are available in the electronic supplementary material. Irrelevant answers to the questions or answers with grammatical errors were excluded from the study. İn June 2024, the three LLMs — ChatGPT-4.0 (OpenAl, California), Copilot (Microsoft Corporation, Washington), Google Gemini (Alphabet Inc., California, USA), and in September 2024, A-Eye Consult (https://www.aeyeconsult.com/, a chatbot developed with LangChain and Pinecone using GPT-4 architecture)— were instructed as follows: “I am going to ask a question; Can you please answer my questions about endophthalmitis correctly”. A separate chat window was launched for each question and it was posed to each of the LLMs separately. As LLMs are capable of providing several answers to a given question, only the initial response was recorded for examination. Since the responses were meant for endophthalmitis, reference article citation was unnecessary.

The recorded responses were evaluated separately by two attending ophthalmologists (SD,ICT) using the five-point Likert scale that was previously used to evaluate the accuracy of LLMs, and responses were rated between one and 5.²¹ After debating and utilizing the criteria to resolve discrepancies in the Likert scale ratings, the ophthalmologists agreed that the consensus score represented the final assessment. The scoring was as follows: 1: Strongly disagree; 2: Not in agreement; 3: Neither in agreement nor dispute; 4: Accept; 5: Completely concurred.

The DISCERN scale was used to assess the accuracy and reliability of the responses in more detail.²² The system assigns a score between one and five to each of the 15 questions, assessing the impartiality and comprehensiveness of medical treatment data. The total score ranges from 15–75, with categories: outstanding (63–75), acceptable (51-62), ordinary (39–50), bad (27–38), and severely awful (15–26).

Finally, the well-known Flesch–Kincaid readability test scale, encompassing the Flesch Reading Ease (FRE) and Flesch–Kincaid Grade Level (FKGL) tests, was used to evaluate the responses’ comprehensibility and complexity, respectively. The FRE score ranges from 0–100, with higher scores indicating more reading ease and lower scores indicating more difficult in understanding.²³ The FRE score quantifies the ease of understanding of the text, whereas the FKGL index evaluates the educational level needed to comprehend the text and its intricacy, with higher scores indicating greater complexity of content. In order to evaluate the chatbots’ performance, the responses from each bot were collectively assessed using the DISCERN score and the FRE and FKGL readability scale. Each model received a single composite score as a consequence of this assessment.

Statistical analysis

The Kolmogorov–Smirnov test was performed to assess the normality of the data. The dataset was summarized with descriptive statistics. Categorical variables were represented by numbers and percentages, and quantitative variables were presented as means ± standard deviations or medians (minimum-maximum). The mean values of the DISCERN, FRE, and FKGL scales were reported. The data were processed using IBM SPSS Statistics Version 25 (Armonk, NY, USA). The Friedman test was used to compare the LLMs; if a significant difference between the four LLMs was found, pairwise subgroup analyses were performed using the Wilcoxon rank-sum test. A p-value of <0.05 was considered statistically significant. In pairwise group comparisons, statistical significance was determined at p < .01 following Bonferroni correction.

Results

In this study, out of the 25 questions about endophthalmitis, five questions focued on its etiology, 12 on its diagnosis, and eight on its treatment. A-Eye Consult, a special eye chatbot, obtained the highest score on the Likert scale, receiving five points for all 25 questions (100%). It was followed by ChatGPT-4.0, which scored five points on 18 questions (72%). The other chatbots, Gemini and Copilot, each scored five points on 12 questions (48%). The A-Eye Consult chatbot did not score below five points on any question. ChatGPT-4.0 scored four points in 5 (20%) questions, Google Gemini scored four points in 7 (28%) questions and Copilot scored four points in 5 (20%) questions. Similarly, ChatGPT-4.0 received three points for 2 (8%) questions, Google Gemini received three points for 6 (24%) questions and Copilot received three points for 8 (32%) questions. No LLMs obtained one or two points for any question. Table 1 displays the responses of all four LLMs using the Likert scale. Significant differences were observed between all four LLMs in total Likert score (p < .001). Pairwise comparisons showed that A-Eye Consult Likert scores were significantly higher than those of Google Gemini and Copilot (both: p < .001), while no statistical superiority was found against ChatGPT-4.0 (p = .038). Furthermore, pairwise comparisons showed that ChatGPT-4.0 scored statistically higher than Google Gemini and Copilot overall (both: p < .001), whereas Google Gemini was not statistically superior to Copilot (p = .579). Table 2 shows the statistical comparison of the mean Likert scores of the three LLMs for the 25 questions.

Table 1.

The Likert scores of the 25 answers given by the large language models for the questions asked.

Likert scale	A-eye consult (n)	ChatGPT-4.0 (n)	Google gemini (n)	Copilot (n)
5	25	18	12	12
4	—	5	7	5
3	—	2	6	8
2	—	—	—	—
1	—	—	—	—

Table 2.

Mean ± Standard Deviation scores of the four large language models in the total question categories.

Category	A-eye consult	ChatGPT	Google gemini	Copilot	p*
Category	Mean ± SD	Mean ± SD	Mean ± SD	Mean ± SD	p*
Total	5.00	4.64±0.63	4.24±0.83	4.16±0.89	<0.001

*Friedman Test.

In terms of readability, Gemini had the highest FRE score of 68.5 points and was found to be the most readable of the four models. The ChatGPT-4.0 FRE score was 38.3 points, while that of Copilot was 57.8 points. The A-Eye Consult chatbot scored the lowest in terms of readability with 23.7 points. In the analysis, A-Eye Consult received the highest FKGL score for text complexity with 16.7 points (post-graduate level). The FKGL scores for the other chatbots were 12.3, 8.9, and 9.2 points for ChatGPT (college level for FKGL), Google Gemini (middle school level for FKGL) and Copilot (high school level for FKGL), respectively. Table 3 shows the reliability and readability of the four LLMs’ responses.

Table 3.

Comparison of reliability and readability mean scores of the large language models.

Large Language Models	Reliability	Readability
Large Language Models	Discern Score	Flesch Reading Ease score	Flesch– Kincaid Grade level
A-eye consult	75	23.7	16.7
ChatGPT-4.0	63	38.3	12.3
Google gemini	55	68.5	8.9
Copilot	53	57.8	9.2

Discussion

In this study, A-Eye Consult had the highest DISCERN score of 75 points among the four LLMs, followed by ChatGPT-4.0, Google Gemini, and Copilot with 63, 55, and 53 points, respectively. This result shows that A-Eye Consult, a chatbot for eye diseases, provides more reliable information about endophthalmitis. Furthermore, the DISCERN score for ChatGPT-4.0 was excellent, while those of Google Gemini and Copilot were acceptable. In the analysis for the FRE readability score, the lowest FRE score was 23.7 points for A-Eye Consult, while the highest score was 68.5 points for Google Gemini. Furthermore, in the analysis of the FKGL score, Google Gemini received the lowest score (8.9; middle school level for FKGL), while A-Eye Consult received the highest score (16.7; postgraduate level for FKGL). These results indicate that Google Gemini performed the best among the four LLMs in terms of readability and comprehensibility, while A-Eye Consult performed the worst in terms of readability and comprehensibility.

The world is currently interested in artificial intelligence and its potential applications in medicine. When applied effectively, AI can enhance patient awareness of diseases and facilitate early diagnosis.²⁴ One such condition is endophthalmitis, wherein the intraocular fluid is inflamed; it can be a serious medical emergency causing permanent loss of vision. This inflammation typically develops after recent keratitis, intraocular injections, trauma, or eye surgery.⁴ Approximately 10% of endophthalmitis cases result in a visual acuity of 20/800 or worse, posing a serious risk to visual functioning, even though in the majority of cases, nearly normal vision is regained.⁶ A meta-analysis on endophthalmitis, which is very common among patients and physicians, reported an incidence of 0.056% following intravitreal eye injections with anti-VEGF agents, translating to one case occurring in evert 1779 injections.²⁵ Early diagnosis of endophthalmitis is critical for optimal care and the prevention of disease progression.¹⁸ Consequently, it is believed that the employment of artificial intelligence systems in broad-language chatbots can help to improve patient outcomes by decreasing the number of missed or delayed diagnoses.

Large Language Models (LLMs) are computer systems capable of understanding, synthesizing, and inferring from user problems.²⁶ They serve as a resource for patients and aid healthcare professionals in addressing electronic patient communications. They are user-friendly, efficient, and dependable.^27,28 Previous research has demonstrated that users are generally eager to look for health advice via chatbots, implying that LLMs can be viewed as an alternate source of knowledge for patients in specific scenarios where it is impossible to reach a physician.^29–31 Since endophthalmitis is a serious disease that can lead to blindness if it progresses, it is critical that patients use LLMs to obtain accurate, reliable, and clear information and that correct facts are accessed. To the best of our knowledge, this is the first study in the literature to evaluate the accuracy and readability of the responses given by LLMs regarding endophthalmitis. LLMs have been used to build artificial intelligence applications such as ChatGPT-4.0, Google Gemini and Copilot. OpenAI’s newest language model, ChatGPT-4.0, was built with a LLM and was publicly introduced in March 2023. Similarly, Google Gemini and Copilot language learning initiatives were announced in 2023. These are current AI language programs with characteristics similar to ChatGPT.^32,33 This study found that A-Eye Consult and ChatGPT-4.0 chatbots provided more detailed and accurate answers to patient questions about endophthalmitis than Google Gemini and Copilot, with 100% and 92% of A-Eye Consult’s and ChatGPT-4.0’s answers, respectively, falling into the “agree” or “strongly agree” categories. This rate was found to be 76% for Google Gemini and 68% for Copilot. In a study of large language models assisting glaucoma surgery, Carla et al. reported a 58% success rate for ChatGPT and 32% for Google Gemini (p < .001).³⁴ In another study conducted by Lee et al. that compared LLMs in bariatric and metabolic surgery, a significant difference was found between the three LLMs: ChatGPT-4 85.7%, Bard 74.3%, and Bing 25.7%.³⁵ The results of this study were consistent with the literature in terms of comprehensiveness and readability. For example, in a study comparing Google Gemini and ChatGPT on cataracts and cataract surgery, Cohen and colleagues found that, consistent with the findings of this study, ChatGPT provided a more accurate and actionable information on eye health for patients with high health literacy. Additionally, the answers provides by ChatGPT were longer and written at a higher reading level.¹⁴

These results may be attributable to the fundamental differences in the aims and structures of the four LLMs. A-Eye Consult, an ophthalmology chatbot primarily developed by Dr. Singer, MD, was built using GPT-4 (the LLM that powers the public chatbot ChatGPT-4), LangChain, and Pinecone; it primarily uses the GPT-4.0 architecture. A-Eye Consult’s database includes the 2021–2022 American Ophthalmology Basic and Clinical Science Course textbook and the seventh edition of the Wills Eye Manual. Although the data in this study are not statistically higher than ChatGPT-4.0 as the Likert scale of the Aeyeconsult chatbot (p = 0 0.038), it scored significantly higher than other chatbots in the Likert scales (p < .001, p < .001, respectively). A-Eye Consult, which was developed as a chatbot specialising in ophthalmology, is at an advantage because it uses the ChatGPT-4.0 architecture and textbooks of ophthalmology in its database.³⁶ It is thought that the reliability, accuracy, and complexity of ChatGPT-4.0’s responses to the questions about endophthalmitis, compared to those of the other two chatbots— Google Gemini and Microsoft Copilot— are attributable to its advanced algorithms. Various data sources are used to train ChatGPT-4.0 to generate conversational experiences that resemble human interactions. To encourage in-depth conversations, ChatGPT-4.0 offers more comprehensive and educational answers. Unlike Google Gemini and Copilot, it has its own database. ChatGPT generates accurate and contextually relevant responses by utilising supervised and reinforcement learning approaches. Conversely, Google Gemini and Copilot are artificial intelligence systems designed to streamline information access and search, with an emphasis on conciseness and clarity. Furthermore, Copilot and Google Gemini can pull web searches and information from a variety of unreliable websites. Although Copilot uses the ChatGPT-4.0 architecture, it also accesses resources with the help of an Internet search engine likely contributing to the performance difference between these two chatbots.³⁷ As with many previous language program comparisons, this is also why the ChatGPT-4.0 algorithm is considered to be successful when compared to other language programs in the field of ophthalmology today.

Limitations of the study

This study has some limitations. A major limitation is that only questions about a single eye infection, endophthalmitis, were included in the study. Furthermore, only four chatbots were investigated and other frequently used chatbots, such as Claude or LlaMA, were not included. In addition, one of the most important limitations of the study is that all the LLMs were asked the questions only once and graded according to the answers provided. Consequently, the study did not address the question of response repeatability. Another issue that may have an impact on the study is the fact that it is a cross-sectional analysis. Language chatbots may receive constant updates, so it should be borne in mind that the answers received may change. Although this study was blinded, the scores presented may be subjective. This should be considered when evaluating the study.

Conclusion

In this study, we aimed to evaluate the accuracy, reliability, and readability of answers provided by four chatbots in response to questions about endophthalmitis. A-Eye Consult, which uses ChatGPT-4’s architecture and is a specific ophthalmological chatbot, was found to be worse than other chatbots in terms of readability and comprehensibility, although it gave highly reliable and accurate answers to the questions asked. Similarly, although ChatGPT-4.0 provided more accurate and reliable answers to the questions than those by Google Gemini and Copilot, they were more complex than those of Google Gemini and Copilot in terms of comprehensibility. The main difference between the LLMs is that each chatbot has its own algorithm and architecture.

To the best of our knowledge, this is the first study to assess the effectiveness, accuracy, and readability of responses to endophthalmitis-related questions provided by LLMs. This study findings suggest that artificial intelligence LLMs, particularly those utilizing their database in addition to the GPT-4 architecture, such as A-Eye Consult and ChatGPT-4.0, have substantial potential as reliable tools for answering endophthalmitis-related inquiries. Artificial intelligence chatbots, which are projected to be widely utilized by patients in the next few years, are thought to be crucial for providing patients with accurate and reliable information on acute eye problems like endophthalmitis. AI language programs should be considered a useful resource for both physicians and patients, rather than an alternative for clinicians. However, further studies investigating the effectiveness of AI chatbots for endophthalmitis and eye emergencies are needed.

Supplemental Material

Supplemental Material - Evaluation of the reliability and readability of answers given by chatbots to frequently asked questions about endophthalmitis: A cross-sectional study on chatbots

Supplemental Material for Evaluation of the reliability and readability of answers given by chatbots to frequently asked questions about endophthalmitis: A cross-sectional study on chatbots by Suleyman Demir in Health Informatics Journal.

Footnotes

Acknowledgements

I would like to sincerely thank Dr İsmail Cem Türkeş for his help in this study. I also would like to thank Editage () for English language editing.

Author contributions

Suleyman Demir: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Resources, Writing - Original Draft, Writing - Review & Editing, Visualization, Supervision, Project administration, Funding acquisition

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Ethical statement

ORCID iD

Suleyman Demir

Supplemental Material

Supplemental material for this article is available online.

References

Petrillo

Sinoca

Fea

, et al. Candida biofilm eye infection: main aspects and advance in novel agents as potential source of treatment. Antibiotics 2023; 12(8): 1277.

Durand

. Endophthalmitis. Clin Microbiol Infection : The Official Publication of the European Society of Clinical Microbiology and Infectious Diseases 2013; 19(3): 227–234. DOI: 10.1111/1469-0691.12118.

Taban

Behrens

Newcomb

, et al. Acute endophthalmitis following cataract surgery: a systematic review of the literature. Arch Ophthalmol 2005; 123(5): 613–620. DOI: 10.1001/archopht.123.5.613.

Wade

Whitescarver

Ashcroft

, et al. Endophthalmitis: a bibliographic review. Int Ophthalmol 2021; 41(12): 4151–4161. DOI: 10.1007/s10792-021-01967-y.

Okada

Johnson

Liles

, et al. Endogenous bacterial endophthalmitis. Report of a ten-year retrospective study. Ophthalmology 1994; 101(5): 832–838.

Results of the Endophthalmitis Vitrectomy Study . A randomized trial of immediate vitrectomy and of intravenous antibiotics for the treatment of postoperative bacterial endophthalmitis. Endophthalmitis Vitrectomy Study Group. Arch Ophthalmol 1995; 113(12): 1479–1496.

Wadbudhe

Tidke

Tidake

. Endophthalmitis after cataract surgery: a postoperative complication. Cureus 2022; 14(10): e30110. DOI: 10.7759/cureus.30110.

Merani

Johnson

McCannel

, et al. Clinical practice update: management of infectious endophthalmitis after intravitreal anti-VEGF injection. Journal of vitreoretinal diseases 2022; 6(6): 443–451. DOI: 10.1177/24741264221116487.

Delsoz

Raja

Madadi

, et al. The use of ChatGPT to assist in diagnosing glaucoma based on clinical case reports. Ophthalmology and therapy 2023; 12(6): 3121–3132. DOI: 10.1007/s40123-023-00805-x.

10.

Momenaei

Wakabayashi

Shahlaee

, et al. Appropriateness and readability of ChatGPT-4-generated responses for surgical treatment of retinal diseases. Ophthalmology. Retina 2023; 7(10): 862–868. DOI: 10.1016/j.oret.2023.05.022.

11.

Bernstein

Zhang

Govil

, et al. Comparison of ophthalmologist and large language model chatbot responses to online patient eye care questions. JAMA Netw Open 2023; 6(8): e2330320. DOI: 10.1001/jamanetworkopen.2023.30320.

12.

Huang

Hirabayashi

Barna

, et al. Assessment of a large language mLarge Language Model's responses to questions and cases about glaucoma and retina management. JAMA ophthalmology 2024; 142(4): 371–375. DOI: 10.1001/jamaophthalmol.2023.6917.

13.

Potapenko

Boberg-Ans

Stormly Hansen

, et al. Artificial intelligence-based chatbot patient information on common retinal diseases using ChatGPT. Acta Ophthalmol 2023; 101(7): 829–831. DOI: 10.1111/aos.15661.

14.

Cohen

Brant

Fisher

, et al. Dr. Google vs. Dr. ChatGPT: exploring the use of artificial intelligence in ophthalmology by comparing the accuracy, safety, and readability of responses to frequently asked patient questions regarding cataracts and cataract surgery. Semin Ophthalmol 2024; 39(6): 472–479. DOI: 10.1080/08820538.2024.2326058.

15.

Borkowski

Jakey

Mastorides

, et al. Applications of ChatGPT and large Language Models in medicine and health care: benefits and pitfalls. Fed Pract : For the Health Care Professionals of the VA, DoD, and PHS 2023; 40(6): 170–173. DOI: 10.12788/fp.0386.

16.

Betzler

Chen

Cheng

C-Y

, et al. Large language models and their impact in ophthalmology. Lancet Digit Health 2023; 5(12): e917–e924.

17.

Feng

Luo

M-J

, et al. Latest developments of generative artificial intelligence and applications in ophthalmology. Asia Pac J Ophthalmol 2024; 13(4): 100090.

18.

Spadea

Giannico

. Diagnostic and management strategies of Aspergillus endophthalmitis: current insights. Clin OphthalmolClinical ophthalmology (Auckland, N.Z.) 2019; 13: 2573–2582. DOI: 10.2147/opth.s219264.

19.

Shukla

Mishra

Banerjee

, et al. The comparison of ChatGPT 3.5, Microsoft bing, and Google Gemini for diagnosing cases of neuro-ophthalmology. Cureus 2024; 16(4): e58232. DOI: 10.7759/cureus.58232.

20.

Mandalos

Tsouris

. Artificial versus human intelligence in the diagnostic approach of ophthalmic case scenarios: a qualitative evaluation of performance and consistency. Cureus 2024; 16(6): e62471. DOI: 10.7759/cureus.62471.

21.

Ajmera

Nischal

Ariyaratne

, et al. Validity of ChatGPT-generated musculoskeletal images. Skeletal Radiol 2024; 53(8): 1583–1593. DOI: 10.1007/s00256-024-04638-y.

22.

Charnock

Shepperd

Needham

, et al. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. Journal of epidemiology and community health 1999; 53(2): 105–111. DOI: 10.1136/jech.53.2.105.

23.

Lucy

Rakestraw

Stringer

, et al. Readability of patient education materials for bariatric surgery. Surg Endosc 2023; 37(8): 6519–6525. DOI: 10.1007/s00464-023-10153-3.

24.

Kerci

Sahan

. An analysis of ChatGPT4 to respond to glaucoma-related questions. J Glaucoma 2024; 33(7): 486–489. DOI: 10.1097/ijg.0000000000002408.

25.

Fileta

Scott

Flynn

Jr . Meta-analysis of infectious endophthalmitis after intravitreal injection of anti-vascular endothelial growth factor agents. Ophthalmic surgery, lasers & imaging retina 2014; 45(2): 143–149. DOI: 10.3928/23258160-20140306-08.

26.

Raiaan

MAK

Mukta

MSH

Fatema

, et al. A review on large Language Models: architectures, applications, taxonomies, open issues and challenges. IEEE Access 2024; 12: 26839–26874. DOI: 10.1109/ACCESS.2024.3365742.

27.

Tailor

Dalvin

Starr

, et al. A comparative study of large Language Models, human experts, and expert-edited large Language Models to neuro-ophthalmology questions. J Neuro Ophthalmol 2024; 9900: 02145. DOI: 10.1097/wno.0000000000002145.

28.

Tan Yip Ming

Rojas-Carabali

Cifuentes-González

, et al. The potential role of large Language Models in uveitis care: perspectives after ChatGPT and bard launch. Ocul Immunol Inflamm 2024; 32(7): 1435–1439. DOI: 10.1080/09273948.2023.2242462.

29.

Kedia

Sanjeev

Ong

, et al. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (London, England) 2024; 38(7): 1252–1261. DOI: 10.1038/s41433-023-02915-z.

30.

Haque

MDR

Rubya

. An overview of chatbot-based mobile mental health apps: insights from app description and user reviews. JMIR mHealth and uHealth 2023; 11: e44838. DOI: 10.2196/44838.

31.

Chin

Song

Baek

, et al. The potential of chatbots for emotional support and promoting mental well-being in different cultures: mixed methods study. J Med Internet ResJournal of medical Internet research 2023; 25: e51712. DOI: 10.2196/51712.

32.

Wen

Wang

. The future of ChatGPT in academic research and publishing: a commentary for clinical and translational medicine. Clin Transl Med 2023; 13(3): e1207. DOI: 10.1002/ctm2.1207.

33.

Rudolph

Tan

. War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. Journal of Applied Learning and Teaching 2023; 6(1): 364–389.

34.

Carlà

Gambini

Baldascino

, et al. Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison. Graefe's Archive for Clinical and Experimental Ophthalmology 2024; 262(9): 2945–2959. DOI: 10.1007/s00417-024-06470-5.

35.

Lee

Shin

Tessier

, et al. Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations. Surg Obes Relat Dis 2024; 20(7): 603–608. DOI: 10.1016/j.soard.2024.03.011.

36.

Singer

Chow

, et al. Development and evaluation of Aeyeconsult: a novel ophthalmology chatbot leveraging verified textbook knowledge and GPT-4. J Surg Educ 2024; 81(3): 438–443. DOI: 10.1016/j.jsurg.2023.11.019.

37.

Giannakopoulos

Kavadella

Aaqel Salim

, et al. Evaluation of the performance of generative AI large Language Models ChatGPT, Google bard, and Microsoft bing chat in supporting evidence-based dentistry: comparative mixed methods study. J Med Internet ResJournal of medical Internet research 2023; 25: e51580. DOI: 10.2196/51580.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.75 MB