Sage Journals: Discover world-class research

Abstract

Purpose:

This study evaluates the performance of large language models (LLMs) in the context of the Chinese National Traditional Chinese Medicine Licensing Examination (TCMLE).

Materials and Methods:

We compared the performances of different versions of Generative Pre-trained Transformer (GPT) and Enhanced Representation through Knowledge Integration (ERNIE) using historical TCMLE questions.

Results:

ERNIE-4.0 outperformed all other models with an accuracy of 81.7%, followed by ERNIE-3.5 (75.2%), GPT-4o (74.8%), and GPT-4 turbo (50.7%). For questions related to Western internal medicine, all models showed high accuracy above 86.7%.

Conclusion:

The study highlights the significance of cultural context in training data, influencing the performance of LLMs in specific medical examinations.

Get full access to this article

View all access options for this article.

References

Yalamanchili

, Sengupta

, Song

, et al. Quality of large language model responses to radiation oncology patient care questions. JAMA Netw Open, 2024; 7(4):e244630; doi: 10.1001/jamanetworkopen.2024.4630

Kung

, Cheatham

, Medenilla

, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. Dagan A. ed. PLOS Digit Health, 2023; 2(2):e0000198; doi: 10.1371/journal.pdig.0000198

Liang

, Zhao

, Peng

, et al. Enhanced artificial intelligence strategies in renal oncology: Iterative optimization and comparative analysis of GPT 3.5 versus 4.0. Ann Surg Oncol, 2024; 31(6):3887–3893; doi: 10.1245/s10434-024-15107-0

Armitage

. Performance of Generative Pre-trained Transformer-4 (GPT-4) in Membership of the Royal College of General Practitioners (MRCGP)-style examination questions. Postgrad Med J, 2024; 100(1182):274–275; doi: 10.1093/postmj/qgad128

Brin

, Sorin

, Vaid

, et al. Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Sci Rep, 2023; 13(1):16492; doi: 10.1038/s41598-023-43436-9

Temsah

M-H

, Jamal

, Alhasan

, et al. Transforming virtual healthcare: The potentials of ChatGPT-4omni in telemedicine. Cureus, 2024; 16(5):e61377; doi: 10.7759/cureus.61377

Anonymous. Baidu Research. n.d. Available from: http://research.baidu.com/Blog/index-view?id=183 [Last accessed: June 4, 2024 ].

Chinese National Health Commission Medical Licensing Examination Committee. Edition of the Medical Licensing Examination Syllabus (Traditional Chinese Medicine and Integrated Chinese-Western Medicine), 2020. Available from: https://www.nhc.gov.cn/cms-search/downFiles/a1aeecde23a84c8c84834e258d3695e9.pdf [Accessed 20.5.2024.]

Working Group for Analysis of Past Medical Licensing Examination Questions. Analysis of the True Questions of the TCM Practicing Physician Qualification Examination (Ninth Edition) (2021 National Medical Qualification Examination Book) (Chinese Edition). China Medical Science and Technology Press: Beijing; 2020.

10.

Huang

, Hu

, Cai

, et al. The performance evaluation of artificial intelligence ERNIE bot in Chinese national medical licensing examination. Postgraduate Medical Journal, 2024; 100(1190):qgae062–qgqe953; doi: 10.1093/postmj/qgae062

11.

, Zhang

, Jin

, et al. Physician versus large language model chatbot responses to web-based questions from autistic patients in Chinese: Cross-Sectional comparative analysis. J Med Internet Res, 2024; 26:e54706; doi: 10.2196/54706

12.

Şahin

, Ateş

, Keleş

, et al. Responses of five different artificial intelligence chatbots to the top searched queries about erectile dysfunction: A comparative analysis. J Med Syst, 2024; 48(1):38; doi: 10.1007/s10916-024-02056-0

13.

Wang

, Wu

, Dou

, et al. Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI. Int J Med Inform, 2023; 177:105173; doi: 10.1016/j.ijmedinf.2023.105173

14.

K-C

, Bu

Z-J

, Shahjalal

, et al. Performance of ChatGPT on Chinese master’s degree entrance examination in clinical medicine. Grewal HS. ed. PLoS One, 2024; 19(4):e0301702; doi: 10.1371/journal.pone.0301702

15.

Fang

, Wu

, Fu

, et al. How does ChatGPT-4 preform on non-English national medical licensing examination? An evaluation in Chinese language. Li-Jessen NY-K. ed. PLOS Digit Health, 2023; 2(12):e0000397; doi: 10.1371/journal.pdig.0000397

16.

Tong

, Guan

, Chen

, et al. Artificial intelligence in global health equity: An evaluation and discussion on the application of ChatGPT, in the Chinese national medical licensing examination. Front Med (Lausanne), 2023; 10:1237432; doi: 10.3389/fmed.2023.1237432

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.19 MB

GPT Versus ERNIE for National Traditional Chinese Medicine Licensing Examination: Does Cultural Background Matter?