Abstract
Objective
This study aimed to evaluate the performance of large language models—ChatGPT-4o and Gemini 1.5 Pro—in assessing suicide risk and guiding treatment in adolescents presenting to the emergency department with suicidal ideation and/or attempts.
Materials and Methods
A retrospective review was conducted on child psychiatry consultation notes from 36 adolescents evaluated between February and March 2024. Structured clinical data were entered into ChatGPT and Gemini, and the resulting decisions were compared to those made by clinicians regarding hospitalization, sedation need, medication initiation, follow-up timing, and notification of social services or law enforcement.
Results
ChatGPT showed higher concordance with clinicians than Gemini, especially in hospitalization (41.6% agreement) and sedation decisions (100% agreement). ChatGPT recommended hospitalization in 58.3% of cases, compared to 33.3% by clinicians and 36.1% by Gemini. For outpatient cases, ChatGPT demonstrated partial alignment with clinical decisions on medication and follow-up, while Gemini’s responses were often uncertain or incomplete.
Conclusion
Large language models show promise as decision-support tools in adolescent psychiatric emergencies. ChatGPT was more consistent with clinical judgments than Gemini. However, limitations remain, and further studies involving broader populations are needed before routine clinical integration.
Plain Language Summary
Suicide attempts in adolescents are serious and complex situations that require careful evaluation by clinicians. In this study, we compared how two artificial intelligence (AI) systems, ChatGPT and Gemini, perform in supporting clinical decisions for adolescents presenting to the emergency department after a suicide attempt. We used real clinical cases and asked both AI systems to make decisions about hospitalization, need for sedation, medication use, and follow-up timing. We then compared these decisions with those made by experienced clinicians. Our findings showed that ChatGPT generally performed closer to clinicians, especially in decisions such as sedation and follow-up planning. However, it also tended to recommend hospitalization more often, suggesting a more cautious approach. Gemini, on the other hand, showed more uncertainty and lower agreement with clinicians. Although AI systems showed some strengths in structured decision-making, they were not consistent across all areas and relied entirely on the information provided by clinicians. This means that they cannot replace human judgment. Overall, AI tools may be helpful as support systems, but final decisions should always be made by trained healthcare professionals, especially in sensitive situations such as adolescent mental health emergencies.
Keywords
Get full access to this article
View all access options for this article.
