Abstract
Background:
Artificial intelligence (AI) and machine-learning technology are on the rise, including ChatGPT and Google’s Gemini, previously known as Bard. While recent studies have begun to explore the role of AI in medicine, there are no studies to date on the application of large language models to facial feminization surgery.
Objective:
Establish the reliability of ChatGPT and Google AI (Bard) in providing patient education and describing surgical techniques specific to facial feminization surgery.
Methods:
This was a prospective observational study. Both ChatGPT and Bard were queried with 9 questions. The same series of questions was asked 3 months later. Five expert reviewers were asked to rate responses in a blinded fashion using the modified AI-DISCERN criteria. One-way ANOVA and Tukey’s HSD were used for analysis.
Results:
ChatGPT outperformed Google Bard in areas of harmlessness, truthfulness, relevance, bias, sources, and aims at baseline. However, Bard was often able to resolve this discrepancy in 3 months.
Conclusions:
While large language models show promise in disseminating patient education and general information on facial feminization, caution must be exercised reviewing outputs as inaccurate medical content may be encountered. However, repeated exposure seems to improve the quality of responses over time.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
