Abstract
Background/aim
This study aimed to investigate whether machine learning (ML) algorithms could accurately differentiate adolescents with Major Depressive Disorder (MDD) from healthy controls (HC) based on neurocognitive data.
Method
Adolescents diagnosed with MDD and HC were assessed using structured interviews, and neurocognitive functions were measured via tests for verbal and visual memory, working memory, executive functions, processing speed, inhibition, verbal fluency, and social cognition/Theory of Mind skills. Feature selection was performed using a tree-based approach and implemented through multiple ML algorithms. To address class imbalance, ML models were trained with Synthetic Minority Over-sampling Technique, and model performance was optimized using stratified 10-fold cross-validation (CV). Shapley Additive Explanations (SHAP) values were computed to interpret feature contributions.
Results
A total of 117 MDD and 67 HC adolescents were included in the study. The Support Vector Classifier (SVC) achieved the highest performance, with a mean accuracy = 76.0% (range [min–max] = 71.1%–80.9%), and a mean Area Under Curve (AUC) = 79.0%, (range [min–max] = 74.7%–82.4%); followed by Ridge Classifier (accuracy = 71.8% [65.6%–78.0%]), Linear Discriminant Analysis (accuracy = 71.8% [67.2%–76.4%]), Bagging Classifier (accuracy = 71.2% [63.7%–78.7%]), Random Forest (accuracy = 69.6% [61.8%–77.4%]), Gaussian Naive Bayes (accuracy = 69.6% (63.5%–75.7%]), Ridge Classifier CV (accuracy = 69.1% [62.5%–75.7%]) and Multilayer Perceptron (accuracy = 65.3% [57.5%–73.2%]). SHAP value identified symbol coding, categorical fluency and Stroop Test parameters as the most influential features.
Conclusions
ML techniques showed good performance in distinguishing adolescents with MDD from HC, with SVC achieving the highest accuracy. Cognitive domains related to processing speed and executive functions appear to be clinically relevant, suggesting that future studies should explore their role in first-episode, medication-naive adolescents and assess whether ML-based cognitive profiling can support early recognition.
Plain Language Summary
This study examined whether machine learning methods could distinguish adolescents with major depressive disorder from healthy peers using thinking and memory test results. Data from 117 adolescents with depression and 67 healthy adolescents were analyzed. The best-performing model correctly classified participants with an average accuracy of 76%. Tasks related to processing speed, verbal fluency, and cognitive control were the most important in differentiating depressed adolescents from healthy controls. These findings suggest that combining cognitive testing with machine learning may help support the identification of depression in adolescents.
Get full access to this article
View all access options for this article.
