Abstract
Introduction
Large language models (LLMs) such as OpenAI's GPT-4o are increasingly used to summarize information and report trends in available data for medical education. For integrated plastic surgery, the utility of LLMs to recommend taking a research year has not been established. We aim to establish the reliability of ChatGPT reproducibility of research year recommendations for medical students applying to integrated plastic surgery.
Methods
De-identified, self-reported integrated plastics applicant profiles in publicly available Google Sheets from 2022–2025 were assembled. Inputs provided to GPT-4o (three runs per profile) included Step 2 CK (Clinical Knowledge) score, AOA designation, and research productivity. Research-year status and match outcome were withheld. The model returned a binary recommendation to pursue a research year. Reproducibility was summarized as cross-run concordance. We compared model recommendations with applicants’ actual research-year decisions.
Results
Of 98 entries, 55 complete profiles were retained. Mean Step 2 CK was 258.3 (SD = 10.4). Applicants reported a mean 20.1 (SD = 19.9) research presentations, 3.84 (SD = 3.6) first-author publications, and 9.18 (SD = 6.4) total publications. Twenty-one eligible applicants (51.2%) reported AOA. Overall, 98.2% (54/55
Conclusion
ChatGPT demonstrated internal consistency, but its recommendations could not predict which students would take a research year en route to a successful residency match.
Get full access to this article
View all access options for this article.
