Sage Journals: Discover world-class research

Abstract

OBJECTIVES

Generative artificial intelligence (AI) models such as OpenAI's ChatGPT and Google's Bard have forced educators to consider how these tools will be efficiently utilized to improve medical education. This article investigates current literature on how generative AI is and could be used and implemented in undergraduate medical education (UME).

METHODS

A rapid review of the literature was performed utilizing a librarian-generated search strategy to identify articles published before June 30, 2023, in 6 databases (Pubmed, EMBASE.com, Scopus, ERIC via EBSCO, Computer Science Database via EBSCO, and CINAHL via EBSCO). Inclusion criteria were (1) a focus on osteopathic and/or allopathic UME and (2) a defined use or implementation strategy for generative AI. Two reviewers screened all articles, and data extraction was performed by 1 reviewer and confirmed by the other reviewer.

RESULTS

A total of 521 relevant articles were screened during this review. Forty-one articles underwent full-text review and data extraction. The majority of the articles were opinion pieces (9), case reports (8), letters to the editor (5), editorials (5), and commentaries (3) about the use of generative AI while 7 articles used qualitative and/or quantitative methods. The literature is best divided into 5 categories of uses for generative AI in UME: nonclinical learning assistant, content developer, virtual patient interaction, clinical decision-making tutor, and medical writing. The literature indicates generative AI tools’ greatest potential is for use as a virtual patient and clinical decision-making tutor.

CONCLUSIONS

While the possibilities proliferate for generative AI in UME, there remains a dearth of quantitative evidence of its use for improving learner outcomes. The majority of the literature opines the potential for utilization, but only 7 studies formally evaluated the results of using generative AI. Future research should focus on the effectiveness of incorporating generative AI into preclinical and clinical curricula in UME.

Keywords

generative AI medical education large language model artificial intelligence ChatGPT

Introduction

When OpenAI released the first widely available large language model (LLM) ChatGPT in November 2022, there was a shift in the world's perception of artificial intelligence (AI). Immediately, companies such as Microsoft and Google began to release their LLMs into the market to compete. Since that time, there have been calls to increase our understanding of generative AI models in all fields to better harness the power of these tools.^1–3 As opposed to other AI models, the ease of access and user-friendly interface have made these tools popular with the general public, but the question remains about how these tools could be harnessed in academic settings such as medical education.⁴ ChatGPT's ability to pass USMLE (United States Medical Licensing Examination) Step 1, 2, and 3 examinations has provoked medical educators to grapple with how to best approach deploying this technology in their learning context.⁵ Academia, including medical education, is polarized regarding the use of generative AI with ethical concerns arising over misuse, cheating, and privacy. There are even calls to develop detectors that can discern human-written content from content written by LLMs although many believe this will spark an unwinnable arms race.⁶ Educators across the educational spectrum are beginning to study how best to teach and implement such a tool in a classroom setting.

Previous technological shifts have proven to alter the way in which physicians practice. As one example, the electronic health record (EHR) changed how clinicians document and maintain a patient's health record, but medical education continues to struggle in preparing students for this part of medical practice. Since its rapid growth in the medical field, there have been calls to increase training on these systems earlier in medical education.⁷ However, primary care physicians regularly cite the burden of documentation, data entry, and poor interoperability as increasing their level of burnout.⁸ Additionally, the recognition of the technology's importance has not universally translated into increased training or educational experiences in EHR documentation in undergraduate medical education (UME).⁹ The EHR provides a cautionary tale to medical education that with the advent of generative AI, high-quality educational curricula that build the technological skills to engage with these tools will become essential to prepare future physicians for medical practice.

The potential to improve the efficiency of learners and educators derives, in part, from the ability to understand the idiosyncrasies of natural language and produce readable outputs.¹⁰ As the technology and algorithms behind generative AI improve, so will its ability to aid medical students throughout medical school and into medical practice. Learning from the introduction of electronic medical records, educators would benefit from understanding how best to incorporate these technologies into UME to harness the potential increase in efficiency to help students become good stewards of their resources during training and in practice. To better understand the current evidence and best practices behind how to implement generative AIs in teaching, this article performs a rapid review of the available literature about the potential uses, implementation strategies, and recommendations for this technology in UME. A rapid review methodology was chosen due to the emerging and changing nature of generative AI technology. Rapid reviews have been established as a method to investigate broad research questions in emerging fields to provide rapid guidance to stakeholders.¹¹ Through this review, we aim to provide clarity for educators and students about best practices for implementation of these tools in UME as well as identify gaps in the literature to inform future investigation into the inclusion of these tools into classrooms and curricula.

Methods

Search Strategy

A trained clinical health sciences librarian (Wright) performed our comprehensive electronic search of publications using the following databases: Pubmed, EMBASE.com, Scopus, ERIC via EBSCO, Computer Science Database via EBSCO, and CINAHL via EBSCO. All database results were collected from the inception of the database through June 30, 2023. Search terms were used to retrieve articles addressing the 2 main concepts of the search strategy: (1) medical education and (2) artificial intelligence. Search strategies include a combination of both text words and controlled vocabulary, when available.

(Pubmed example search query: “generative ai” OR chatbot OR chatgpt OR “chat gpt” OR “chat generative” OR “large language model*” OR llm[tiab] OR llms[tiab] OR bard[tiab]) AND (“medical education"[tiab] OR “medical school"[tiab] OR “medical Schools"[tiab] OR “medical student"[tiab] OR “medical students"[tiab] OR education, medical[mesh] OR schools, medical[mesh] OR students, medical[mesh] OR “medical education"[Title/Abstract:∼4] OR (medical[ti] AND (student*[ti] OR educati*[ti] OR school*[ti]))).)

Results were downloaded to EndNote, and duplicates were removed. All references were uploaded to Covidence systematic review software, a web-based tool designed to facilitate and track each step of the abstraction and review process.¹²

Selection of Articles

Inclusion criteria were (1) a focus on osteopathic and allopathic UME and (2) a defined use or implementation strategy for generative AI tools. Exclusion criteria were (1) no focus on osteopathic or allopathic UME, (2) no specific mention of generative AI, and (3) no mention of potential uses or implementation strategies for generative AI. No article or study types were excluded. Generative AI was defined as a model that utilizes natural language processing (NLP) to analyze user inputs and generate novel outputs/responses without relying upon rule-based systems or retrieval-based models that select from preexisting responses or categories.

All articles were reviewed by 2 reviewers (Hale, Alexander) on the study team to ensure they met the inclusion criteria. Each member reviewed the abstract for each article independently, and any conflicts between the 2 reviewers were resolved by consensus. Any article that was included then underwent full-text screening by both reviewers, and conflicts were resolved via consensus.

Data Extraction

Data extraction was performed by 1 reviewer (Hale) with confirmation of data by the other reviewer (Alexander). Data were extracted between August 10 and September 18, 2023. Data extracted included article identifiers such as title, author(s), journal of publishing, date of publishing, university, and country of the primary author. In addition, the specified aims of the article, study type, educational setting (eg, clinical or nonclinical), the language used to describe generative AI, potential uses of generative AI in UME, study-specific outcomes, and recommendations for the implementation of generative AI into UME were extracted. Results were then reported narratively per Cochrane guidelines.¹¹

Results

Search Results

A total of 852 articles were identified, and 521 articles remained after duplicates were removed. Of these articles, 459 were deemed irrelevant based on abstract screening, and the remaining 62 underwent full-text screening. Twenty-one articles were excluded with 41 articles remaining for data extraction (Figure 1). The 41 articles were written in 19 different countries: 12 in the United States; 5 in Germany; 3 in India; 2 in Canada, Qatar, Pakistan, the United Kingdom, and China, respectively; 1 was a multinational project and the rest were written in Jordan, Iran, Mexico, Oman, Republic of Korea, Afghanistan, France, Singapore, Japan, and Greece (Table 1). Nine articles were opinion pieces, 8 case reports, 5 editorials, 5 letters to the editor, 3 commentaries, 3 quantitative studies, 2 mixed methods studies, and 2 comparison studies, as well as a technical evaluation, scoping review, book chapter, and cross-sectional study. A total of 13 different identifiers of generative AI were used with ChatGPT being the most commonly used (32 articles) followed by large language model (18 articles) and chatbot (8 articles). The educational settings for the articles were primarily nonclinically focused (18 articles) or both clinically and nonclinically focused (14 articles).

Figure 1.

PRISMA diagram.

Table 1.

Characteristics of articles included.

Title	Author(s) (Year)	Journal	Primary Author Affiliation (Country)	Study Type	Educational Setting	Potential Uses for Generative AI in UME	Study Outcomes (If Applicable)	Recommendations for Implementation in UME
The next paradigm shift? ChatGPT, artificial intelligence, and medical education	Wang, et al (2023)	Medical Teacher	University of Texas Medical Branch (United States)	Letter to the Editor	Nonclinical	Provide a personalized learning experience	-	-
Finding the Place of ChatGPT in Medical Education	van de Ridder, et al (2023)	Academic Medicine	Michigan State University (United States)	Letter to the Editor	Nonclinical	Writing assistanceCase scenario development	-	-
Practical Applications of ChatGPT in Undergraduate Medical Education	Tsang (2023)	Journal of Medical Education and Curricular Development	University of British Columbia (Canada)	Commentary	Both clinical and nonclinical	Facilitate evidence-based decision makingGenerate a differential diagnosisKnowledge resourceClinical reasoning tutorExam preparationMedical writing assistance	-	-
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study	Takagi, et al (2023)	JMIR Medical Education	Shimane University (Japan)	Comparison study	Nonclinical	Provide medical informationGenerate differential diagnoses	Overall, GPT-4 significantly outperformed GPT-3.5 by 29.1% (P < .001).	-
Chatbot versus Medical Student Performance on Free-Response Clinical Reasoning Examinations	Strong & DiGiammarino, et al (2023)	JAMA Internal Medicine	Stanford University (United States)	Case report	Clinical	Clinical reasoning tutor	ChatGPT met or exceeded the predefined passing threshold of 70% on 12 out of the 28 (43%) runs (Table 1), with a mean score of 69% (95% CI: 65% to 73%).	Redesign assessments to continue to identify struggling students despite the use of a chatbot.
Revolutionizing Medical Education: Can ChatGPT Boost Subjective Learning and Expression?	Seetharaman (2023)	Journal of Medical Systems	Seth G.S. Medical College & KEM Hospital (India)	Opinion	Both clinical and nonclinical	Provide a personalized learning experienceSmall-group assessment aidMedical writing assistanceClinical reasoning tutorVirtual patient interaction	-	-
Early applications of ChatGPT in medical practice, education, and research	Sedaghat (2023)	Clinical Medicine	University Hospital of Heidelberg (Germany)	Opinion	Nonclinical	Medical writing assistance	-	-
ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations	Sallam, et al (2023)	Narra J	The University of Jordan (Jordan)	Commentary	Both clinical and nonclinical	Improve personalized learningImprove clinical reasoningAssist in understanding complex medical concepts	-	-
ChatGPT in Medicine; a Disruptive Innovation or Just One Step Forward?	Parsa & Ebrahimzadeh (2023)	Archives of Bone and Joint Surgery	Mashhad University of Medical Sciences (Iran)	Editorial	Both clinical and nonclinical	Curriculum developmentPersonalize study plans and materialsAssessment and evaluation	-	-
Chatbots for future docs: exploring medical students’ attitudes and knowledge toward artificial intelligence and medical chatbots	Moldt, et al (2023)	Medical Education Online	University of Tuebingen (Germany)	Mixed Methods	Clinical	Aid in clinical decision-making	No change in attitudes about chatbots were identified after the course, but improvement in the understanding of how chatbots work was noted.	-
The Pros and Cons of Using ChatGPT in Medical Education: A Scoping Review	Mohammad, et al (2023)	Healthcare Transformation with Informatics and Artificial Intelligence	Hamad Bin Khalifa University (Qatar)	Scoping review	Both clinical and nonclinical	Medical writing assistancePractice translatingSpeed up information processing and analysisCreation of educational contentPersonalize learningAutomate scoring	-	-
ChatGPT and medical education: technological shooting star or disruptive change?	Sánchez (2023)	Investigacion en Educacion Medica	Universidad Nacional Autonoma de Mexico (Mexico)	Editorial	Both clinical and nonclinical	Provide a personalized learning experienceQuick access to medical informationAid in clinical decision-making	-	-
Medical Teacher's first ChatGPT's referencing hallucinations: Lessons for editors, reviewers, and teachers	Masters (2023)	Medical Teacher	Sultan Qaboos University (Oman)	Case report	Nonclinical	Medical writing can be impacted by hallucinations from LLM models.	-	-
The rise of ChatGPT: Exploring its potential in medical education	Lee (2023)	Anatomical Sciences Education	Keimyung University (Republic of Korea)	Opinion	Nonclinical	Virtual teaching assistantSummarize medical literature	-	-
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models	Kung et al (2023)	PLOS Digital Health	AnsibleHealth Inc. (United States)	Quantitative research	Nonclinical	Answer explanation generationMultiple-choice question creation	ChatGPT approaches passing performance demonstrates high internal concordance and can offer some insight to human learners.	-
The Potential of ChatGPT in Medical Education: Focusing on USMLE Preparation	Koga (2023)	Annals of Biomedical Engineering	Mayo Clinic (United States)	Letter to the Editor	Nonclinical	Interactive study aidCreate multiple-choice questions	-	-
ChatGPT—Reshaping medical education and clinical management	Khan, et al (2023)	Pakistan Journal of Medical Sciences	Riphah International University (Pakistan)	Opinion	Nonclinical	Automate scoringTeaching assistancePersonalize learningResearch assistanceQuick access to informationGenerating case ScenariosCreating content to facilitate learningLanguage translation	-	Implement as a supplemental tool, not a replacement for human sources.
The Advent of Generative Language Models in Medical Education	Karabacak, et al (2023)	JMIR Medical Education	Mount Sinai Health System (United States)	Opinion	Both clinical and nonclinical	Generation of novel contentDevelopment of simulationsCreation of digital patientsAssessment and evaluation	-	Develop guidelines for best practices.Implementation of AI content detectors or AI classifiers.Shift toward diverse assessment methods.
ChatGPT: Friend or foe in medical writing? An example of how ChatGPT can be utilized in writing case reports	Ho, et al (2023)	Surgery in Practice and Science	USF Health Morsani College of Medicine (United States)	Case report	Clinical	Medical case report writing assistance	Improved performance with priming ChatGPT with relevant information and asking a series of related questions to maintain contextual coherence.	Ensure responses are checked for potential sources of bias.
GPT-4: the future of artificial intelligence in medical school assessments	Cooper & Rashid (2023)	Journal of the Royal Society of Medicine	University College of London (UK)	Opinion	Both clinical and nonclinical	Multiple-choice question developmentPerformance assessment and evaluationVirtual patient interactionStudent feedback	-	-
An Explorative Assessment of ChatGPT as an Aid in Medical Education: Use it with Caution	Han, et al (2023)	Medical Teacher	Hackensack Meridian School of Medicine (United States)	Case report	Nonclinical	Learning session developmentMultiple-choice question development	-	Provide education to educators and students about the accuracy and detail of information from ChatGPT.
Harnessing the power of ChatGPT in medical education	Guo & Li (2023)	Medical Teacher	University of South Carolina (United States)	Letter to the Editor	Both clinical and nonclinical	Provide a personalized learning experienceImprove patient communication skills	-	-
How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment	Gilson, et al (2023)	JMIR Medical Education	Yale University (United States)	Case report	Nonclinical	Adjunct or surrogate for small (peer) group education	ChatGPT is capable of correctly answering up to over 60% of questions representing topics covered in the USMLE Step 1 and Step 2 licensing exams.	Teaching students to engage in the follow-up dialogue with the LLM, rather than simply obtaining a simple answer.
ChatGPT: roles and boundaries of the new artificial intelligence tool in medical education and health research – correspondence	Periaysamy, et al (2023)	Annals of Medicine & Surgery	ESIC Medical College and Hospital (Afghanistan)	Opinion	Nonclinical	Provide a personalized learning experience	-	Require citation of references in medical writing assignments.
ChatGPT in medical school: how successful is AI in progress testing?	Friechderichs, et al (2023)	Medical Education Online	Bielefeld University (Germany)	Cross-sectional study	Both clinical and nonclinical	Provide medical information and assistanceProvide a comparative reference point against which to challenge preconceived thoughts	ChatGPT performed better or as well as medical students of all levels on multiple-choice questions.	Monitor for hallucinations and accuracy.
ChatGPT and the Future of Medical Education	Feng & Shen (2023)	Academic Medicine	Southeast University (China)	Letter to the Editor	Both clinical and nonclinical	Provide medical information and assistanceProvide personalized learning experienceClinical reasoning tutor	-	Monitor outputs of AI systems for bias.
The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers	Eysenbach (2023)	JMIR Medical Education	(Canada)	Editorial	Nonclinical	Generate realistic patient case scenariosProvide personalized learning experiencesEnhance medical textbooksGenerate summaries of medical researchMultiple-choice question developmentCurriculum development	-	-
Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum	Das, et al (2023)	Cureus	All India Institute of Medical Sciences (India)	Case report	Nonclinical	Provide medical information and assistance	-	-
ChatGPT versus human in generating medical graduate exam multiple-choice questions—A multinational prospective study (Hong Kong S.A.R., Singapore, Ireland, and the United Kingdom)	Cheung, et al (2023)	PLOS One	(International)	Quantitative research	Nonclinical	Multiple-choice question development	ChatGPT developed similar quality questions significantly faster than human examiners. ChatGPT questions could not be differentiated from human-written questions.	Provide reliable references for generative AI to generate questions.
Biomedical Ethical Aspects Toward the Implementation of Artificial Intelligence in Medical Education	Busch, et al (2023)	Medical Science Educator	Universitätsmedizin Berlin (Germany)	Commentary	Both clinical and nonclinical	Virtual patient interactionPersonalized study plans and materials	-	Promote transparency among generative AI applications.Obtain informed consent from all users.Promote professional responsibility and accountability for the use of generative AI applications.Emphasize equitable access.Collaborate in generative AI development to promote diversity in training data.Regularly audit tools to assess for bias or inequity.Provide AI-independent resources for verificationSecure the privacy and confidentiality of sensitive data.Provide comprehensive training before implementing generative AI applications.Utilize a multidisciplinary approach.Regularly assess and optimize generative AI applications.
The Utilization of ChatGPT in Reshaping Future Medical Education and Learning Perspectives: A Curse or a Blessing?	Breeding, et al (2023)	The American Surgeon	NOVA Southeastern University (United States)	Comparison study	Clinical	Provide medical informationMedical writing assistance	For medical students, ChatGPT displays more clarity but is less comprehensive than evidence-based sources on surgical topics. For laypeople, ChatGPT displays more clarity on some topics.	Improve surveillance systems for use of generative AI.
Artificial intelligence and ChatGPT between worst enemy and best friend: The 2 faces of a revolution and its impact on science and medical schools	Aubignat & Diab (2023)	Revue Neurologique	Amiens University Hospital (France)	Editorial	Nonclinical	Personalize study plans and materials	-	-
The future of medical education and research: Is ChatGPT a blessing or blight in disguise?	Arif, et al (2023)	Medical Education Online	Dow University of Health Sciences (Pakistan)	Editorial	Nonclinical	Medical writing assistance	-	Surveillance systems for use of generative AI.
Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions	Abd-alrazaq, et al (2023)	JMIR Medical Education	Weill Cornell Medicine-Qatar (Qatar)	Opinion	Both clinical and nonclinical	Curriculum developmentTeaching methodologiesPersonalize study plans and materialsAssessment and evaluationMedical writing assistanceProgram monitoring and review	-	Develop guidelines for best practices.Increase student and educator competency with generative AI tools.Adapt student assignments to use or avoid generative AI.Discuss limitations with students.Incorporate more non-English training into generative AI models.Integrate generative AI models into learning management systems.Increase in empirical and evidence-based research about uses and outcomes of generative AI in medical education.
Generative adversarial networks and synthetic patient data: current challenges and future perspectives	Arora & Arora (2022)	Digital Technology	University of Cambridge (UK)	Opinion	Clinical	Create training material and simulations	-	Use to introduce virtually crafted and customizable materials into curricula.
Using a Machine Learning Architecture to Create an AI-Powered Chatbot for Anatomy Education	Li, et al (2021)	Medical Science Educator	University of Hong Kong (China)	Case report	Nonclinical	Interactive study aid	Participants responding to a pre–post survey noted increased self-reported confidence in anatomy knowledge from 2.10 to 3.84 on a Likert scale of 5 following this session.	-
Virtual Integrated Patient: An AI supplementary tool for second-year medical students	Kong, et al (2021)	The Asia Pacific Scholar	National University of Singapore (Singapore)	Quantitative research	Clinical	Virtual patient interaction	87% of participants strongly agreed or agreed that using VIP helped in remembering the content while 69% of them felt that VIP increased their confidence and competence in history-taking.	Provide open and easy access.Focus on process improvement rather than correct diagnosis.
AI, autonomous machines and human awareness: toward shared machine-human contexts in medicine	Miller & Wood (2020)	Human-Machine Shared Contexts	Augusta University (United States)	Book chapter	Both clinical and nonclinical	Generate virtual clinical casesCurriculum developmentEMR trainingPersonalized tutor	-	-
Chatbots in Healthcare Curricula: The Case of a Conversational Virtual Patient	Dolianiti, et al (2020)	Brain Function Assessment in Learning	Aristotle University of Thessaloniki (Greece)	Case report	Clinical	Virtual patient interactionProvide medical information and assistancePersonalized tutorFoster reflection and feedback on learner model	-	Collaborate in the design and development of generative AI.Create a framework for continuous training of generative AI.
AI Medical School Tutor: Modelling and Implementation	Afzal, et al (2020)	Artificial Intelligence in Medicine	IBM Research (India)	Mixed methods	Clinical	Clinical reasoning tutor	Survey results showed all students rated the MST over 0.65 on a normalized Likert scale for usability and experience, usability, and learning.	Improve ability of AI driven tutors to personalize the learning experience.
Evaluation of Chatbot Prototypes for Taking the Virtual Patient's History	Reiswich & Haag (2019)	Studies in Health Technology and Informatics	Heilbronn University of Applied Sciences (Germany)	Technical evaluation	Clinical	Virtual patient interaction	-	-

Potential Uses of Generative AI in UME

Articles varied significantly in their aims and focus with a plurality written in 2023 focusing on the ways that generative AI, and ChatGPT more specifically (29 articles), can be utilized for UME learning (Table 2). The most commonly cited uses for generative AI were personalized learning assistant/tutor/information source (25 articles), aid in content development for educators (12 articles), medical writing (10 articles), clinical decision-making aid/tutor (8 articles), and virtual patient interaction (8 articles).

Table 2.

Articles associated with the 5 categories of uses for generative AI.

Category	Articles
Nonclinical learning assistant—generative AI can function as a personalized assistant, tutor, and/or information source for students during their UME.	Arif, et al (2023)Aubignat (2023)Breeding, et al (2023)Busch, et al (2023)Das, et al (2023)Dolianiti, et al (2020)Eysenbach (2023)Feng & Shen (2023)Gilson, et al (2023)Guo & Li (2023)Khan, et al (2023)Koga (2023)Kung, et al (2023)Lee (2023)Li, et al (2021)Miller & Wood (2020)Mohammad, et al (2023)Parsa & Ebrahimzadeh (2023)Periaysamy, et al (2023)Sallam, et al (2023)Sánchez (2023)Seetharaman (2023)Takagi, et al (2023)Tsang (2023)Wang, et al (2023)
Content developer—generative AI can be used as a tool for the development of educational content and scenarios for educators.	Abd-Alrazaq, et al (2023)Arora & Arora (2023)Cheung, et al (2023)Cooper & Rashid (2023)Eysenbach (2023)Han, et al (2023)Karabacak, et al (2023)Khan, et al (2023)Koga (2023)Miller & Wood (2020)Mohammad, et al (2023)Parsa & Ebrahimzadeh (2023)
Virtual patient—generative AI technology has the capacity to improve the virtual patient interaction experience.	Afzal, et al (2023)Busch, et al (2023)Cooper & Rashid (2023)Dolianiti, et al (2020)Guo & Li (2023)Kong, et al (2021)Reiswich & Haag (2019)Seetharaman (2023)
Clinical decision-making tutor—generative AI can provide feedback to students during their UME that improves their clinical reasoning and decision-making.	Afzal, et al (2023)Miller & Wood (2020)Mohammad, et al (2023)Reiswich & Haag (2019)Sallam, et al (2023)Sánchez (2023)Seetharaman (2023)Tsang (2023)
Medical writing—generative AI can increase the capability and efficiency of students in UME to create written medical-related materials in clinical and research settings.	Abd-Alrazaq, et al (2023)Arif, et al (2023)Breeding, et al (2023)Ho, et al (2023)Masters (2023)Mohammad, et al (2023)Periaysamy, et al (2023)Sedaghat (2023)Seetharaman (2023)Tsang (2023)

Nonclinical Learning Assistant

Five articles focused on the performance of ChatGPT on medical examination questions.^5,13–16 Three articles displayed the ability of ChatGPT to pass standardized multiple-choice examinations.^5,13,15 Takagi et al showed an improvement in the performance of ChatGPT-4.0 compared to ChatGPT-3.5 on the Japanese Medical Licensing Examination¹⁵ while Strong et al showed that ChatGPT 3.5 could answer clinical free-response questions above a passing threshold 43% of the time.¹⁶

Many articles suggested the use of generative AI as a learning assistant or personalized tutor for nonclinical content.^4,17–33 Koga et al suggested that generative AI can provide an additional resource for feedback on multiple-choice questions rapidly.²⁶ Li et al demonstrated that learners who engage with conversational models can improve their comfort with anatomy concepts.³¹ Other articles also suggested that generative AI can be a reference for students to search for and find medical information.^{10,14,34–36}

Content Developer

Articles discussing content development focused on the creation of multiple-choice questions.^4,26,37–39 These articles primarily discussed the ability of ChatGPT to write multiple-choice questions and provide explanations. Cheung showed the ability of ChatGPT to write multiple-choice questions at a similar level of proficiency to that of human examiners in significantly less time. They went on to suggest that in the development of multiple-choice questions, examiners should provide generative AI with reliable training material to generate questions with fewer errors.³⁹

Other articles suggested utilizing generative AI for curriculum content development.^{4,19,21,24,26,32,37–42} Karabacak et al acknowledged the ability of generative AI to develop cases and simulations and suggested the implementation of guidelines and detectors to prevent misuse by students. Additionally, they recommended shifting assessments toward more diverse methodologies.⁴⁰ Han et al also suggested that generative AI can be utilized as a source of content for educators, especially in a small-group learning session, but they suggested educating both educators and students on the accuracy and detail that ChatGPT provides.³⁸ Arora & Arora discussed the possibility of utilizing generative AI models to create synthetic patient data for training purposes.⁴²

Virtual Patient and Clinical Decision-Making Tutor

In 2020, Miller and Wood wrote a chapter about the emerging ability of generative AI to impact medical education. In their chapter, they provided insight into how the power of AI can be utilized as a personalized tutor alongside medical students to help them work through clinical cases.³² Four articles successfully created generative AI models for this type of virtual patient interaction and tutoring.^33,43–45 Afzal et al suggested that this is a model that is functional for students (0.65 on a Likert scale normalized from 0–1 for usability and experience), but there were limitations in the ability of the model at the time to provide personalized feedback.⁴³ Seetharaman, Cooper and Rashid, and Busch et al all indicated the ability of ChatGPT to act as a virtual patient and provide feedback to students on their clinical decision-making.^23,29,37 Three articles suggested the power of ChatGPT to act as a clinical reasoning tutor,^18,20,22 and another suggested that it could improve comfort with patient interactions.²⁷

Medical Writing

Medical writing was frequently cited as an area of use and an area of concern. Uses of the technology in this domain included creative writing assistance, research manuscripts, and grant writing as well as support in writing medical notes.^{10,22–24,28,30,41,46–48} Breeding et al and Ho et al displayed the value of utilizing ChatGPT to perform medical writing.^10,48 Masters clearly defined the ways that ChatGPT-3.5 can hallucinate in providing references.⁴⁷ Periaysamy et al recommended requiring references for all written materials in UME to utilize this drawback of ChatGPT to prevent misuse²⁸ while Arif et al recommended the implementation of surveillance systems for generative AI use.³⁰ Abd-alrazaq et al focused on providing education to students and educators on the use of generative AI as a tool in medical writing and learning.⁴¹

Discussion

The creation, utilization, and ethical dilemmas surrounding the implementation of generative AI in UME are areas of growing interest as AI tools continue to improve efficiency in other fields and expand their footprint in medical practice.⁴⁹ We identified 5 domains in the existing literature for using generative AI in UME: nonclinical learning assistant, content developer, clinical decision-making tutor, virtual patient, and medical writing. Much of the literature written was an immediate response to the wide availability of generative AI tools focused on 1 or more of these domains and lacked specific, quantitative outcomes of the implementation of these tools.^21,23,27 However, the ideas put forth in these articles can provide a starting point for educators looking to innovate in the classroom and prepare students for a medical field that increasingly embraces AI tools.⁵⁰

The strongest evidence at this time points toward utilizing LLMs customized specifically for UME,^{31,33,43–45} rather than adapting models developed for general public use. These resource-intensive undertakings are often outpaced by privately developed tools from OpenAI, Microsoft, and Google. This suggests that educators and learners need effective ways to teach about and utilize these more general tools for their purposes. Establishing curricula based on evidence-based teaching practices for improving student engagement and understanding of these tools and their limitations in a medical context will become essential.

For students, descriptions of generative AI tools have shown that they can provide immediate feedback and insight into their learning as students create a complete differential diagnosis, identify knowledge gaps, or simulate different clinical situations.^{31,33,43–45} Many authors make the case that using generative AI can provide personalized insight into cognitive or practical mistakes that the learner makes during an activity and provide immediate feedback.^{16,22,23,32,33,36,43} This has particular advantages as the tools can potentially operate independently of educators, making this form of tutoring reasonable at scale without significantly increasing personnel requirements. However, it is unclear how eager learners are to engage with new AI-based technologies.⁵¹ These findings indicate that clear, evidence-based curricula are needed to provide a scaffolded means for students to interact with LLMs and accomplish these goals.^33,44,45

Utilizing generative AI has potential benefits for educators as well as students. Breeding et al provided an example of how case reports can be clearly and effectively written for medical students and laypeople which educators can use to increase resources for classroom or independent learning.¹⁰ Additionally, Cheung et al showed that multiple-choice questions are generated quickly and accurately, decreasing the time educators need to spend on creating examinations which could allow them more time to focus on students and innovation.³⁹ Educators looking to incorporate generative AI into their curriculum also need to consider future uses of generative AI in the healthcare field. Having students engage with these tools could enable them to understand patient experiences as well as help them recognize their limitations.^29,52 This basic understanding of generative AI, and AI in general, will allow future physicians to engage with patients and regulators about effective ways to use these tools to improve the healthcare system and keep patients safe.^50,53

For medical educators, the use of generative AI does not come without risk of misuse. These risks led Van de Ridder et al to suggest the creation of consensus guidelines on the use of generative AI in medical education.⁵⁴ Other suggestions for mitigating misuse include the shift toward more diverse assessment styles^16,40,41 or tailoring assignments toward the known limitations of LLMs so that students are forced to address, and understand, the shortfalls of these tools.^28,41 As with the evolution of the EHR, these tools are likely to become ubiquitous in clinical practice so improving learner understanding of generative AI and how to utilize its capabilities will better prepare them for the physician workforce.

This review was limited by the evidence available in the literature and the rapid nature at which literature on this topic is being published. Inclusion of opinion articles, letters to the editor, and commentaries allowed for a limited discussion of the effectiveness of adopting generative AI tools. Only 7 articles investigated generative AI using quantitative and/or qualitative methods, and no articles at the point of data extraction assessed the implementation of these tools in a classroom setting using randomized or prospective designs. At present, this serves as an initial review from the authors who address this topic, and further reviews will be necessary as the literature evolves to better characterize and refine the themes described here.

Conclusion

As generative AI evolves and proliferates, there is the likelihood that users and/or institutions will have to pay for their use, exacerbating disparities in learning environments. Ensuring equitable access to these tools will become imperative for medical schools to provide adequate learning opportunities. Despite the widespread availability of these tools, there remains a dearth of evidence assessing the effectiveness of generative AI for use in UME. Future investigations should utilize quantitative methods to assess learners’ understanding of the material and identify areas of improvement using generative AI tools. Researchers and educators should further evaluate different teaching approaches to improve learner utilization and implementation of generative AI tools for nonclinical and clinical medical training and practice.

Supplemental Material

sj-docx-1-mde-10.1177_23821205241266697 - Supplemental material for Generative AI in Undergraduate Medical Education: A Rapid Review

Supplemental material, sj-docx-1-mde-10.1177_23821205241266697 for Generative AI in Undergraduate Medical Education: A Rapid Review by Joshua Hale, Seth Alexander, Sarah Towner Wright and Kurt Gilliland in Journal of Medical Education and Curricular Development

Footnotes

Acknowledgments

We would like to thank the University of North Carolina School of Medicine for its support of this project.

Authors Contribution

Josh Hale was involved in the conceptualization, methodology, investigation, formal analysis, software, visualization, writing—original draft, and writing—review and editing of this manuscript based on CRediT taxonomy.

Seth Alexander was involved in the conceptualization, methodology, investigation, formal analysis, writing—original draft, and writing—review and editing of this manuscript based on CRediT taxonomy.

Sarah Towner Wright was involved in the methodology, investigation, resources, software, writing—original draft, writing—reviewing and editing of this manuscript based on CRediT taxonomy.

Kurt Gilliland was involved in the conceptualization, methodology, and writing—review and editing of this manuscript based on CRediT taxonomy.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Joshua Hale

Seth Alexander

Supplemental Material

Supplemental material for this article is available online.

References

IEEE special issue on education in the world of ChatGPT and other Generative AI. IEEE Education Society. Published 2023. Accessed October 25, 2023. https://ieee-edusociety.org/ieee-special-issue-education-world-chatgpt-and-other-generative-ai

Special issue: Generative AI, ChatGPT, and the future of human decision making. Accessed October 25, 2023. https://www.callforpapers.co.uk/generative-ai-chatgpt

Call for papers for the special focus issue on ChatGPT and Large Language Models (LLMs) in biomedicine and health | Journal of the American Medical Informatics Association. Oxford Academic. Published 2023. Accessed October 25, 2023. https://academic.oup.com/jamia/pages/call-for-papers-for-special-focus-issue

Eysenbach

. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ. 2023;9:e46885. doi:https://doi.org/10.2196/46885

Kung

Cheatham

Medenilla

, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digital Health. 2023;2:e0000198. doi:https://doi.org/10.1371/journal.pdig.0000198

Thompson

Hsu

. How easy is it to fool A.I.-detection tools? The New York Times. Published 2023. Accessed October 25, 2023. https://www.nytimes.com/interactive/2023/06/28/technology/ai-detection-midjourney-stable-diffusion-dalle.html

Wald

George

Reis

Taylor

. Electronic health record training in undergraduate medical education: bridging theory to practice with curricula for empowering patient-and relationship-centered care in the computerized setting. Acad Med. 2014;89(3):380-386. doi:https://doi.org/10.1097/ACM.0000000000000131

Kroth

Morioka-Douglas

Veres

, et al. The electronic elephant in the room: physicians and the electronic health record. JAMIA Open. 2018;1(1):49-56. doi:https://doi.org/10.1093/JAMIAOPEN/OOY016

Foster

Cuddy

Swanson

Holtzman

Hammoud

Wallach

. Medical student use of electronic and paper health records during inpatient clinical clerkships: results of a national longitudinal study. Acad Med. 2018;93(11 S):S14-S20. doi:https://doi.org/10.1097/ACM.0000000000002376

10.

Breeding

Martinez

Patel

, et al.

The utilization of ChatGPT in reshaping future medical education and learning perspectives: a curse or a blessing?

Am Surg. 2023;0(0):1-7. doi:https://doi.org/10.1177/00031348231180950

11.

Garritty

Gartlehner

Nussbaumer-Streit

, et al. Cochrane rapid reviews methods group offers evidence-informed guidance to conduct rapid reviews. J Clin Epidemiol. 2021;130:13-22. doi:https://doi.org/10.1016/J.JCLINEPI.2020.10.007

12.

Covidence. https://www.covidence.org

13.

Gilson

Safranek

Huang

, et al. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. doi:https://doi.org/10.2196/45312

14.

Friederichs

März

. ChatGPT in medical school: how successful is AI in progress testing? Med Educ Online. 2023;28. doi:https://doi.org/10.1080/10872981.2023.2220920

15.

Takagi

Watari

Erabi

Sakaguchi

. Performance of GPT-3.5 and GPT-4 on the Japanese medical licensing examination: comparison study. JMIR Med Educ. 2023;9:e48002. doi:https://doi.org/10.2196/48002

16.

Strong

Digiammarino

Weng

, et al. Chatbot vs medical student performance on free-response clinical reasoning examinations. JAMA Intern Med. 2023;183(9):1028-1030. doi:https://doi.org/10.1001/JAMAINTERNMED.2023.2909

17.

Wang

Paidisetty

Cano

. The next paradigm shift? ChatGPT, artificial intelligence, and medical education. Med Teach. 2023;45 925-925. doi:https://doi.org/10.1080/0142159X.2023.2198663

18.

Sallam

Salim

Barakat

Al-Tammemi

. ChatGPT applications in medical, dental, pharmacy, and public health education: a descriptive study highlighting the advantages and limitations. Narra J. 2023;3:e103. doi:https://doi.org/10.52225/narra.v3i1.103

19.

Ebrahimzadeh

Parsa

. Corresponding author: chatGPT in medicine; a disruptive innovation or just one step forward? Arch Bone Jt Surg. 2023;11(4):225-226. doi:https://doi.org/10.22038/abjs.2023.22042

20.

Sánchez

. ChatGPT and medical education: technological shooting star or disruptive change? Investigación en Educación Médica. 2023;12(46):5-10. doi:https://doi.org/10.22201/fm.20075057e.2023.46.23511

21.

Khan

Jawaid

Khan

Sajjad

. ChatGPT - reshaping medical education and clinical management. Pak J Med Sci. 2023;39(2):605. doi:https://doi.org/10.12669/PJMS.39.2.7653

22.

Tsang

. Practical applications of ChatGPT in undergraduate medical education. J Med Educ Curric Dev. 2023;10:1-3. doi:https://doi.org/10.1177/23821205231178449

23.

Seetharaman

. Revolutionizing medical education: can ChatGPT boost subjective learning and expression? J Med Syst. 2023;47(1):1-4. doi:https://doi.org/10.1007/S10916-023-01957-W/METRICS

24.

Mohammad

Supti

Alzubaidi

Shah

Househ

. The Pros and Cons of Using ChatGPT in Medical Education: A Scoping Review. Healthcare Transform Inf Article Intell. 2023. doi:https://doi.org/10.3233/SHTI230580

25.

Lee

. The rise of ChatGPT: Exploring its potential in medical education. Anat Sci Educ. 2023. doi:https://doi.org/10.1002/ASE.2270

26.

Koga

. The potential of ChatGPT in medical education: focusing on USMLE preparation. Ann Biomed Eng. 2023;51:2123-2124. doi:https://doi.org/10.1007/s10439-023-03253-7

27.

Guo

. Harnessing the power of ChatGPT in medical education. Med Teach. 2023;45:1063-1063. doi:https://doi.org/10.1080/0142159X.2023.2198094

28.

Periaysamy

Satapathy

Neyazi

Padhi

. ChatGPT: roles and boundaries of the new artificial intelligence tool in medical education and health research-correspondence. Ann Med Surg. 2023;85:1317-1318. doi:https://doi.org/10.1097/MS9.0000000000000371

29.

Busch

Adams

Bressem

. Biomedical ethical aspects towards the implementation of artificial intelligence in medical education. Med Sci Educ. 2023;33(4):1007-1012. doi:https://doi.org/10.1007/s40670-023-01815-x

30.

Bin Arif

Munaf

Ul-Haque

. The future of medical education and research: Is ChatGPT a blessing or blight in disguise? Med Educ Online. 2023:28. doi:https://doi.org/10.1080/10872981.2023.2181052

31.

Sum Li

Sin Nga Lam

See

. Using a machine learning architecture to create an AI-powered chatbot for anatomy education. Med Sci Educ. 2021;31:1729-1730. doi:https://doi.org/10.1007/s40670-021-01405-9

32.

Miller

Wood

. AI, autonomous machines and human awareness: towards shared machine-human contexts in medicine human-machine shared contexts. Hum-Mach Shared Contexts. 2020:205-220. doi:https://doi.org/10.1016/B978-0-12-820543-3.00010-9

33.

Dolianiti

Tsoupouroglou

Antoniou

Konstantinidis

Anastasiades

Bamidis

. Chatbots in healthcare curricula: the case of a conversational virtual patient. Brain Funct Assess Learn. 2020;12462:137-147. doi:https://doi.org/10.1007/978-3-030-60735-7_15

34.

Aubignat

. Artificial intelligence and ChatGPT between worst enemy and best friend: the two faces of a revolution and its impact on science and medical schools. Rev Neurol (Paris). 2023;179(6):520-522. doi:https://doi.org/10.1016/j.neurol.2023.03.004

35.

Das

Kumar

Angom Longjam

, et al. Assessing the capability of ChatGPT in answering first-and second-order knowledge questions on microbiology as per competency-based medical education curriculum. Cureus. 2023;15(3). doi:https://doi.org/10.7759/cureus.36034

36.

Feng

Shen

. ChatGPT and the future of medical education. Acad Med. 2023;98:867-868. doi:https://doi.org/10.1097/ACM.0000000000005242

37.

Haruna-Cooper

Rashid

. GPT-4: the future of artificial intelligence in medical school assessments. J R Soc Med. 2023;116(6):218-219. doi:https://doi.org/10.1177/01410768231181251

38.

Han

Battaglia

Udaiyar

Fooks

Terlecky

. An explorative assessment of ChatGPT as an aid in medical education: use it with caution. Med Teach. 2024;46:657-664. doi:https://doi.org/10.1080/0142159X.2023.2271159

39.

Cheung

BHH

Lau

GKK

Wong

GTC

, et al. ChatGPT versus human in generating medical graduate exam multiple choice questions—A multinational prospective study (Hong Kong S.A.R., Singapore, Ireland, and the United Kingdom). PLoS One. 2023;18(8):e0290691. doi:https://doi.org/10.1371/JOURNAL.PONE.0290691

40.

Karabacak

Ozkara

Margetis

Wintermark

Bisdas

. The advent of generative language models in medical education. JMIR Med Educ. 2023;9:e48163. doi:https://doi.org/10.2196/48163

41.

Abd-Alrazaq

Alsaad

Alhuwail

, et al. Large language models in medical education: opportunities, challenges, and future directions. JMIR Med Educ. 2023;9:e48291. doi:https://doi.org/10.2196/48291

42.

Arora

. Generative adversarial networks and synthetic patient data: current challenges and future perspectives. Future Healthc J. 2022;9(2):190-193. doi:https://doi.org/10.7861/fhj.2022-0013

43.

Afzal

Dhamecha

Gagnon

, et al. AI Medical school tutor: modelling and implementation. Artif Intell Med. 2020;12299:133-145. doi:https://doi.org/10.1007/978-3-030-59137-3_13

44.

Reiswich

Haag

. Evaluation of chatbot prototypes for taking the virtual patient’s history. Stud Health Technol Inf. 2019;260:73-80. doi:https://doi.org/10.3233/978-1-61499-971-3-73

45.

Kong

JSM

Lee

Bharath Pabba

Lee

EJD

Sng

JCG

. Virtual integrated patient: an AI supplementary tool for second-year medical students. Asia Pac Scholar. 2021;6(3):87-90. doi:https://doi.org/10.29060/TAPS.2021-6-3/SC2394

46.

Sedaghat

. Early applications of ChatGPT in medical practice, education and research. Clin Med. 2023;23:278-279. doi:https://doi.org/10.7861/clinmed.2023-0078

47.

Masters

. Medical teacher’s first ChatGPT’s referencing hallucinations: lessons for editors, reviewers, and teachers. Med Teach. 2023;45(7):673-675. doi:https://doi.org/10.1080/0142159X.2023.2208731

48.

Lone

Koussayer

Sujka

. ChatGPT: friend or foe in medical writing? An example of how ChatGPT can be utilized in writing case reports. Surg Pract Sci. 2023;14:100185. doi:https://doi.org/10.1016/j.sipas.2023.100185

49.

Rajpurkar

Chen

Banerjee

Topol

. AI In health and medicine. Nat Med. 2022;28(1):31-38. doi:https://doi.org/10.1038/s41591-021-01614-0

50.

Lee

Bubeck

Petro

. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med. 2023;388(13):1233-1239. doi:https://doi.org/10.1056/NEJMSR2214184/SUPPL_FILE/NEJMSR2214184_DISCLOSURES.PDF

51.

Moldt

Festl-Wietek

Madany Mamlouk

Nieselt

Fuhl

Herrmann-Werner

. Chatbots for future docs: exploring medical students’ attitudes and knowledge towards artificial intelligence and medical chatbots. Med Educ Online. 2023;28. doi:https://doi.org/10.1080/10872981.2023.2182659

52.

Pratt

Madhavan

Weleff

. Digital dialogue—how youth are interacting with chatbots. JAMA Pediatr. 2024;178–429. doi:https://doi.org/10.1001/JAMAPEDIATRICS.2024.0084

53.

Meskó

Topol

. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med. 2023;6(1). doi:https://doi.org/10.1038/S41746-023-00873-0

54.

van de Ridder

JMM

Shoja

Rajput

. Finding the place of ChatGPT in medical education. Acad Med. 2023;98:867. doi:https://doi.org/10.1097/ACM.0000000000005254

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.06 MB