Sage Journals: Discover world-class research

Abstract

Suicidal ideation in adolescents is a critical public health issue requiring early detection. This study examined whether machine learning (ML) and large language models (LLMs) can detect ideation in 1,197 students (ages 10–15) using self-reported Strengths and Difficulties Questionnaire (SDQ) data. Clinically relevant ideation was defined using Suicidal Ideation Questionnaire—Junior (SIQ-JR) cut-offs. Gemini 1.5 Pro and GPT-4o were prompted to estimate SIQ-JR scores from SDQ responses and demographics; Logistic Regression, Naive Bayes, and Random Forest models were trained on either SDQ data or LLM predictions. LLM predictions correlated with SIQ-JR (ρ = .61) and showed good discrimination across thresholds (area under the curve (AUC) ≥ .83), with item-level associations paralleling self-reports, revealing strong associations with emotional symptoms and peer problems. In cross-validated analyses, the best SDQ-based ML model reached sensitivity = .85 and specificity = .72; the best LLM-based model achieved .80 and .74. Notably, ML models trained directly on SDQ responses consistently outperformed those incorporating LLM predictions across all SIQ-JR thresholds. Nonetheless, LLMs demonstrated promising accuracy in identifying suicidal ideation based on SDQ and demographic data. Further refinement and validation are required before these approaches can be considered viable for clinical implementation.

Keywords

suicidal ideation mental health machine learning adolescents internalizing symptoms externalizing symptoms

Get full access to this article

View all access options for this article.

References

Barabucci

Shia

Chu

Harack

Laskowski

(2024). Combining multiple large language models improves diagnostic accuracy. NEJM AI, 1(11), AIcs2400502. https://doi.org/10.1056/AIcs2400502

Belsher

B. E.

Smolenski

D. J.

Pruitt

L. D.

Bush

N. E.

Beech

E. H.

Workman

D. E.

Morgan

R. L.

Evatt

D. P.

Tucker

Skopp

N. A.

(2019). Prediction models for suicide attempts and deaths: A systematic review and simulation. JAMA Psychiatry, 76(6), 642–651. https://doi.org/10.1001/jamapsychiatry.2019.0174

Bisbee

Clinton

J. D.

Dorff

Kenkel

Larson

J. M.

(2024). Synthetic replacements for human survey data? The perils of large language models. Political Analysis, 32(4), 401–416. https://doi.org/10.1017/pan.2024.5

Breiman

(2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324

CDC WISQARS. (2021). 12 leading causes of death, United States: 2021, all deaths with drilldown to ICD codes, both sexes, all races, all ethnicities. https://wisqars.cdc.gov/lcd/?o=LCD&y1=2021&y2=2021&ct=12&cc=ALL&g=00&s=0&r=0&ry=0&e=0&ar=lcd1age&at=groups&ag=lcd1age&a1=0&a2=199

Chien

C. W.

Tai

Y. M.

(2024). Performances of large language models in detecting psychiatric diagnoses from Chinese electronic medical records: Comparisons between GPT-3.5, GPT-4, and GPT-4o. Taiwanese Journal of Psychiatry, 38(3), 134–141. https://doi.org/10.4103/TPSY.TPSY_25_24

Clement

Schauman

Graham

Maggioni

Evans-Lacko

Bezborodovs

Morgan

Rüsch

Brown

J. S.

Thornicroft

(2015). What is the impact of mental health-related stigma on help-seeking? A systematic review of quantitative and qualitative studies. Psychological Medicine, 45(1), 11–27. https://doi.org/10.1017/S0033291714000129

Elkan

(2001, August). The foundations of cost-sensitive learning [Conference session]. 17th International joint conference on artificial intelligence, Vol. 17, No. 1, Lawrence Erlbaum, pp. 973–978.

Fawcett

(2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010

10.

Frank

Hall

M. A.

Witten

I. H.

(2016). The WEKA workbench, online appendix for data mining: Practical machine learning tools and techniques (4th ed.). Morgan Kaufmann.

11.

Gentili

Franchini

Zese

Alberti

Ferrara

Domenicano

Grassi

(2024). Machine learning from real data: A mental health registry case study. Computer Methods and Programs in Biomedicine Update, 5, 100132. https://doi.org/10.1016/j.cmpbup.2023.100132

12.

Goodman

Ford

Simmons

Gatward

Meltzer

(2000). Using the Strengths and Difficulties Questionnaire (SDQ) to screen for child psychiatric disorders in a community sample. British Journal of Psychiatry, 177, 534–539. https://doi.org/10.1192/bjp.177.6.534

13.

Goodman

Meltzer

Bailey

(1998). The strengths and difficulties questionnaire: A pilot study on the validity of the self-report version. European Child & Adolescent Psychiatry, 7(3), 125-130. https://doi.org/10.1007/s007870050057

14.

Hawton

Saunders

K. E.

O’Connor

R. C.

(2012). Self-harm and suicide in adolescents. The Lancet, 379(9834), 2373–2382. https://doi.org/10.1016/S0140-6736(12)60322-5

15.

Hill

Van Eck

Goklish

Larzelere-Hinton

Cwik

(2020). Factor structure and validity of the SIQ-JR in a Southwest American Indian tribe. Psychological Services, 17(2), 207–216. https://doi.org/10.1037/ser0000298

16.

Holmes

Tang

Gupta

Venkatesh

Christensen

Whitton

(2025). Applications of large language models in the field of suicide prevention: Scoping review. Journal of Medical Internet Research, 27, e63126. https://doi.org/10.2196/63126

17.

Hornik

Buchta

Zeileis

(2009). Open-source machine learning: R meets Weka. Computational Statistics, 24(2), 225–232. https://doi.org/10.1007/s00180-008-0119-7

18.

Horowitz

L. M.

Bridge

J. A.

Teach

S. J.

Ballard

Klima

Rosenstein

D. L.

Wharff

E. A.

Ginnis

Cannon

Joshi

Pao

(2012). Ask Suicide-Screening Questions (ASQ): A brief instrument for the pediatric emergency department. Archives of Pediatrics & Adolescent Medicine, 166(12), 1170–1176. https://doi.org/10.1001/archpediatrics.2012.1276

19.

Huth-Bocks

A. C.

Kerr

D. C.

Ivey

A. Z.

Kramer

A. C.

King

C. A.

(2007). Assessment of psychiatrically hospitalized suicidal adolescents: self-report instruments as predictors of suicidal thoughts and behavior. Journal of the American Academy of Child & Adolescent Psychiatry, 46(3), 387-395. https://doi.org/10.1097/chi.0b013e31802b9535

20.

Kaiser

Manewitsch

Rau

Schallner

(2025, June). Simulating human opinions with large language models: Opportunities and challenges for personalized survey data modeling [Conference session]. Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, pp. 82–86. https://doi.org/10.1145/3708319.373368

21.

Kankaanpää

Töttö

Punamäki

R. L.

Peltonen

(2023). Is it time to revise the SDQ? The psychometric evaluation of the Strengths and Difficulties Questionnaire. Psychological Assessment, 35(12), 1069–1084. https://doi.org/10.1037/pas0001265

22.

Keane

E. M.

Dick

R. W.

Bechtold

D. W.

Manson

S. M.

(1996). Predictive and concurrent validity of the suicidal ideation questionnaire among American Indian adolescents. Journal of Abnormal Child Psychology, 24(6), 735–747. https://doi.org/10.1007/BF01664737

23.

Kroenke

Spitzer

R. L.

Williams

J. B.

(2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x

24.

Kusuma

Larsen

Quiroz

J. C.

Gillies

Burnett

Qian

Torok

(2022). The performance of machine learning models in predicting suicidal ideation, attempts, and deaths: A meta-analysis and systematic review. Journal of Psychiatric Research, 155, 579–588. https://doi.org/10.1016/j.jpsychires.2022.09.050

25.

Levkovich

Omar

(2024). Evaluating of BERT-based and large language mod for suicide detection, prevention, and risk assessment: A systematic review. Journal of Medical Systems, 48(1), 113. https://doi.org/10.1007/s10916-024-02134-3

26.

Longobardi

Morese

Fabris

M. A.

(2020). COVID-19 emergency: Social distancing and social exclusion as risks for suicide ideation and attempts in adolescents. Frontiers in Psychology, 11, 551113. https://doi.org/10.3389/fpsyg.2020.551113

27.

Lowry

N. J.

Goger

Hands Ruz

Cha

C. B.

(2024). Suicide risk screening tools for pediatric patients: A systematic review of test accuracy. Pediatrics, 153(3), e2023064172. https://doi.org/10.1542/peds.2023-064172

28.

Malgaroli

Schultebraucks

Myrick

K. J.

Loch

A. A.

Ospina-Pinillos

Choudhury

Kotov

De Choudhury

Torous

(2025). Large language models for the mental health community: Framework for translating code to care. The Lancet Digital Health, 7(4), e282–e285. https://doi.org/10.1016/S2589-7500(24)00255-3

29.

Marzocchi

G. M.

Capron

Di Pietro

Duran Tauleria

Duyme

Frigerio

Gaspar

M. F.

Hamilton

Pithon

Simões

Thérond

(2004). The use of the Strengths and Difficulties Questionnaire (SDQ) in Southern European countries. European Child & Adolescent Psychiatry, 13, ii40–ii46. https://doi.org/10.1007/s00787-004-2007-1

30.

Morese

Longobardi

(2020). Suicidal ideation in adolescence: A perspective view on the role of the ventromedial prefrontal cortex. Frontiers in Psychology, 11, 713. https://doi.org/10.3389/fpsyg.2020.00713

31.

Moss

A. C.

Roberts

A. J.

Yi-Frazier

J. P.

Read

K. L.

Taplin

C. E.

Weaver

K. W.

Pihoker

Hirsch

I. B.

Malik

F. S.

(2022). Identifying suicide risk in adolescents and young adults with type 1 diabetes: Are depression screeners sufficient? Diabetes Care, 45(5), 1288–1291. https://doi.org/10.2337/dc21-1553

32.

Muschelli

III . (2020). ROC and AUC with a binary predictor: A potentially misleading metric. Journal of Classification, 37(3), 696–708. https://doi.org/10.1007/s00357-019-09345-1

33.

P. J.

Yaramala

S. R.

Kim

J. A.

Kim

Goes

F. S.

Zandi

P. P.

Vande Voort

J. L.

Sutor

Croarkin

Bobo

W. V.

(2018). The PHQ-9 Item 9 based screening for suicide risk: A validation study of the Patient Health Questionnaire (PHQ)−9 Item 9 with the Columbia Suicide Severity Rating Scale (C-SSRS). Journal of Affective Disorders, 232, 34–40. https://doi.org/10.1016/j.jad.2018.02.045

34.

Nandhini

Shrinivas

Vinod

(2023, December). Machine learning approaches for suicidal ideation detection on social media [Conference session]. 2023 4th International Conference on Computation, Automation and Knowledge Management (ICCAKM), IEEE, pp. 1–7. https://doi.org/10.1109/ICCAKM58659.2023.10449591

35.

Núñez

Villacura-Herrera

Gaete

Meza

Andaur

Robinson

(2024). Psychometric assessment of the Suicidal Ideation Questionnaire Junior: A two-study validation in Spanish-speaking adolescents. Current Psychology, 43(16), 14411–14424. https://doi.org/10.1007/s12144-023-05422-2

36.

Ortuno-Sierra

Fonseca-Pedrero

Paino

i Riba

S. S.

Muniz

(2015). Screening mental health problems during adolescence: Psychometric properties of the Spanish version of the Strengths and Difficulties Questionnaire. Journal of Adolescence, 38, 49–56. https://doi.org/10.1016/j.adolescence.2014.11.001

37.

Posner

Brown

G. K.

Stanley

Brent

D. A.

Yershova

K. V.

Oquendo

M. A.

Currier

G. W.

Melvin

G. A.

Greenhill

Shen

Mann

J. J.

(2011). The Columbia–Suicide Severity Rating Scale: Initial validity and internal consistency findings from three multisite studies with adolescents and adults. American Journal of Psychiatry, 168(12), 1266–1277. https://doi.org/10.1176/appi.ajp.2011.10111704

38.

Rensi

C. F.

Gonzalez

Jr. Barta

Dykeman

Geisler

(2025). Evaluating generative AI for depression diagnosis: Implications for counselor education and supervision. Journal of Technology in Counselor Education and Supervision, 6(1), 8. https://doi.org/10.61888/2692-4129.1130

39.

Reynolds

W. M.

(1987). Suicidal Ideation Questionnaire. Psychological Assessment Resources.

40.

Reynolds

W. M.

Mazza

J. J.

(1999). Assessment of suicidal ideation in inner-city children and young adolescents: Reliability and validity of the Suicidal Ideation Questionnaire-JR. School Psychology Review, 28(1), 17–30. https://doi.org/10.1080/02796015.1999.12085945

41.

Rish

(2001, August). An empirical study of the naive Bayes classifier [Paper presentation]. IJCAI 2001 workshop on empirical methods in artificial intelligence (Vol. 3, No. 22, pp. 41–46).

42.

Runeson

Odeberg

Pettersson

Edbom

Jildevik Adamsson

Waern

(2017). Instruments for the assessment of suicide risk: A systematic review evaluating the certainty of the evidence. PLOS ONE, 12(7), e0180292. https://doi.org/10.1371/journal.pone.0180292

43.

Settanni

Quilghini

Toscano

Marengo

(2025). Assessing the accuracy and consistency of large language models in triaging social media posts for psychological distress. Psychiatry Research, 351, 116583. https://doi.org/10.1016/j.psychres.2025.116583

44.

Spitzer

R. L.

Kroenke

Williams

J. B.

, Patient Health Questionnaire Primary Care Study Group, & Patient Health Questionnaire Primary Care Study Group. (1999). Validation and utility of a self-report version of PRIME-MD: The PHQ primary care study. JAMA, 282(18), 1737–1744. https://doi.org/10.1001/jama.282.18.1737

45.

John

J. R.

Lin

P. I.

(2023). Machine learning-based prediction for self-harm and suicide attempts in adolescents. Psychiatry Research, 328, 115446. https://doi.org/10.1016/j.psychres.2023.115446

46.

Theunissen

M. H.

de Wolff

M. S.

Reijneveld

S. A.

(2019). The strengths and difficulties questionnaire self-report: A valid instrument for the identification of emotional and behavioral problems. Academic Pediatrics, 19(4), 471-476. https://doi.orgi/10.1016/j.acap.2018.12.008

47.

Van Spijker

B. A.

Batterham

P. J.

Calear

A. L.

Farrer

Christensen

Reynolds

Kerkhof

A. J

. (2014). The Suicidal Ideation Attributes Scale (SIDAS): Community-based validation study of a new scale for the measurement of suicidal ideation. Suicide and Life-Threatening Behavior, 44(4), 408–419. https://doi.org/10.1111/sltb.12084

48.

Volkmer

Meyer-Lindenberg

Schwarz

(2024). Large language models in psychiatry: Opportunities and challenges. Psychiatry Research, 339, 116026. https://doi.org/10.1016/j.psychres.2024.116026

49.

Vugteveen

de Bildt

Timmerman

M. E.

(2022). Normative data for the self-reported and parent-reported Strengths and Difficulties Questionnaire (SDQ) for ages 12-17. Child and Adolescent Psychiatry and Mental Health, 16(1), 5. https://doi.org/10.1186/s13034-021-00437-8

50.

Whiting

Fazel

(2019). How accurate are suicide risk prediction models? Asking the right questions for clinical practice. BMJ Mental Health, 22(3), 125–128. https://doi.org/10.1136/ebmental-2019-300102

51.

World Health Organization. (2021). Suicide worldwide in 2019: Global health estimates.

52.

Yang

Huebner

E. S.

Tian

(2024). Prediction of suicidal ideation among preadolescent children with machine learning models: A longitudinal study. Journal of Affective Disorders, 352, 403–409. https://doi.org/10.1016/j.jad.2024.02.070

Detecting Suicidal Ideation in Adolescence Using Self-Reported Emotional and Behavioral Patterns: Comparing Machine Learning and Large Language Model Predictions

Abstract

Keywords

Get full access to this article

References