Abstract
Introduction
Suicide, or completed suicide, is defined as the realization of the act to intentionally kill oneself.1,2 When not fatal, the concept of ‘attempted suicide’ is used. According to the World Health Organization (WHO), suicide is one of the main causes of death, causing more than 700,000 deaths yearly. 3 It is specifically the fourth main cause of death for young people globally, which is a major cause of concern. 4 Early diagnosis and care is a major concern, as many mental disorders appear before adulthood and may decrease quality of life if untreated. 5 Moreover, for every successful suicide, there are many more attempts, causing medical expenses and affecting communities. 6 Analysis of the environment of the suicides and the suicide attempts may give us some risk factors involved in this choice, 7 including ethnicity 8 and gender. 9 Suicide is also more prevalent in low- or middle-income countries, 3 or in people who live alone or are unfit for working. 10 There is also a clear relationship between suicide risk and previous suicide attempts. 11
It has been recorded that most people who commit suicide consulted with a specialist in a short span of time before. 12 This is why this type of death is considered preventable, yet, difficult to detect. Among the methods for suicide ideation detection, machine learning methods applied to questionnaires, electronic health records or suicide notes have been used. 13 There have been previous works in which data extracted from verbal questionnaires responded by adolescents were analysed using a Support Vector Machine (SVM) for the classification of suicidal and non-suicidal patients. 14 In related works, tabular features resulting from questionnaires or scales such as the Pierce Suicidal Intent Scale (PSIS) 15 or the international personality disorder examination screening questionnaire 16 have been analysed with regression techniques for classification purposes.
In this article, we apply both knowledge-based and machine learning-based inferred approaches to tabular data obtained from a set of measuring instruments implemented as questionnaires in which adolescents and youth who have entered the child welfare system took part. Then, we propose and compare the results achieved with the two approaches. In addition, by learning which suicide-related concepts measured by the questionnaires are the most relevant to achieve a good classification performance, we propose a reduction of the original set of assessment instruments. The aim is to be able to detect a risk of suicide in an adolescent population with as few questions as possible and with a high performance.
The main contributions of our work are the following ones: (1) By analysing the results of a set of questionnaires filled in by a group of young people, we have defined a novel diagrammatic knowledge-based representation of an algorithm, a step-by-step approach whose objective is to assess the level of suicide risk of the surveyed, “low”, “moderate”, or “high”, and measured its performance. (2) Using the result of the set of questionnaires as input for various data-based inferred algorithms, we have measured their performance and determined the usefulness of machine learning in this particular domain. (3) We have seized the relevance of the questionnaire items towards suicide-attempt ideation, contributing to questionnaire choice for suicide risk assessment and identifying the most important items and performing feature ablation to detect the redundant ones. This feature ablation approach identified a minimal yet highly informative subset core of items for risk prediction.
These contributions have been used to answer two research questions: • Are all the measuring instruments employed in the study relevant to estimate the suicide risk? Are there redundant questions or irrelevant questions that are nearly uncorrelated with the suicide risk assessment? • Utilizing the results of this set of questionnaires, do knowledge-based or data-inferred approaches better assess the level of suicide risk of the respondent?
Related work
Some applications of machine learning in healthcare, and specifically targeting the field of mental health, could be addiction treatment, cyber-harassment recognition, and detection of several mental health issues such as depression, bipolarity, and anxiety. 17 Despite ongoing development and challenges, these applications hold immense potential. Another application could be the use of machine learning in order to detect suicide risk. 12 Existing research usually lacks open data and is inconsistent in terms of varying dataset sizes, features, demographics, detection methods, metrics, and time spans. 18
Even though the outcomes of the researches are varied, binary risk estimation seems to be the main focus. Nevertheless, suicide attempt in a time span, suicide ideation and suicidality are also of interest. In this work, a classification of risk of suicide attempt is calculated, with the categories being “low risk”, “moderate risk”, and “high risk”.
The different works also differ in the choice of the population range. Most studies focus on adult high-risk people, like veterans. 12 Even though adults constitute the highest number of suicides, the main risk of ideation and suicidal conducts corresponds to teenagers and young adults aged 15–29, and not many researches can be found about this study group. We have striven to fill that gap by researching the impact applied to young people with high risk.
The reviews around this subject highlight the difference of risk factors depending on the environment, which is why population-based investigations of suicide risk are needed. For adults, the use of alcohol and drugs, the presence of disabilities, and the socio-economical environment and life events have a significant effect.19,20 This may not be the case for other demographics. Some studies try to analyse the environmental factors exclusively. 21 Although suicide in the youth population is a complex and multi-causal phenomenon, it generally occurs when certain life stressors and mental health factors converge to leave a young person with a sense of hopelessness, despair, and social isolation. Apart from demographical factors, including gender, the vulnerability, negative affect, and feelings of inadequacy can lead to suicidal thoughts. 22 Research suggests that the most important predictor of completed suicide is a history of suicide attempts. 23
Regarding feature ablation, there have been studies showing that feature reduction is possible thanks to the identification of the main factors contributing to a machine learning classification. 24 One of the main goals of this work is, therefore, to identify risk factors based on a predictive model.
We initiated our work by modelling a knowledge-based system involving experts. While the approach had not been made explicit before, it was employed by health practitioners. Furthermore, we explored data-driven inferred models due to their current impact. The literature rarely compares knowledge-based systems with inferred approaches, 25 even though the comparison allows interpreting and improving the existing model.
Most studies use electronic medical records (EMR) to calculate this risk of suicide. Lately, importance of the use of social media for the early detection of suicide risk has been highlighted, as online community members could be more likely to show indications they do not disclose to healthcare workers. 26 Some studies have been conducted regarding this, usually targeting X (before, Twitter). The use of Natural Language Processing (NLP) is needed for this. There are also studies which include biological variables like urine samples or neuroimaging. 27 There is, seemingly, a lack of datasets involving heterogeneous types of features. 28 Another gap in research is the scarcity of works employing a mental health survey corpus.29,30 In this work, survey-based data will be used to develop the risk calculus.
With regard to the main trends employed to assess suicide risk, random forests, decision trees and SVMs are normally used. NLP is also used in different techniques. Cross-validation, specially k-folds, is also usually utilized to check the validity of the models, 12 with accuracy and AUC 12 being the main focus. In this work, different models with be developed and contrasted.
Materials and methods
Measuring instruments
We have incorporated various measuring instruments to assess the level of suicidal ideation in the young and adolescent population.
In our work, the measuring instruments were presented to the target population of youth in residential care in Spain as a set of independent question items (I) and questionnaires (Q): • The Adolescent Suicidal Behavior Assessment Scale (SENTIA)
31
is a validated self-report tool that measures a range of suicide-related thoughts and behaviors in various timeframes. For the present study, we used three SENTIA items (I) designed to assess lifetime experiences of suicide ideation (“Have you ever had ideas about taking your life?”), previous suicidal attempt (“Have you tried to take your own life?”), and non-suicidal self-injury or self-harm (“Have you harmed yourself [self-injury: cuts, punctures, etc.] without intent to die?”). • Number of previous suicide attempts (I). A number is given for the attempts to take one’s own life. • Suicide Desire and Plan are used to assess whether the respondent has suicidal desire and/or a plan. These are yes/no items (I). • Suicides of people near (I). A yes/no question about whether people in their surrounding have committed suicide is made. • The relationship with the person of the attempt and how much the respondent identifies with that person is inspected (I). • Suicide Cognitions Scale-Revised (SCS-R)
32
questionnaire (Q). It is a 16-item self-report questionnaire designed to measure the Suicidal belief system, a range of beliefs, attitudes, expectations, and perceptions associated with the emergence of suicidal thoughts and behaviors (e.g., “Nothing can help solve my problems”). Respondents indicate on a 5-point Likert-type scale the degree to which they agree or disagree with each item statement (range 0, strongly disagree to 4, strongly agree). Item responses are summed to provide an overall metric of the suicidal belief system, with higher scores indicating increased vulnerability to suicidal thoughts and behaviors. • Patient Health Questionnaire, or PHQ (Q)
33
consists of eight items (e.g., “Feeling down, depressed, or hopeless”) that assess the presence of depressed mood, anhedonia, sleep problems, fatigue, changes in appetite or weight, feelings of guilt or worthlessness, difficulty concentrating, and feelings of laziness or worry during the past 2 weeks. Items are scored on a 4-point Likert-type scale, from 0 (never) to 3 (almost every day). A score of 10 or above is frequently used as a cut-off point to identify patients with major depression. We purposely opted to use the PHQ-8 rather than the PHQ-9 as the ninth item in the latter assesses thoughts of death and self-harm, which might potentially have confounded the results. • The Spanish adaptation of the Psychache Scale
34
in young adults, PS-E (Q),
35
is used to measure psychological or mental pain. It consists of 13 items that assess mental pain and anguish (e.g., “I can’t take my pain any more”) in a Likert-type scale. Items 1–9 direct respondents to indicate how often they experience mental pain (e.g., “I feel psychological pain”) on a 5-point scale ranging from 1 (never) to 5 (always), while their task on items 10–13 is to indicate how much they disagree or agree with statements reflecting mental pain (e.g., “I can’t take my pain any more”), using a 5-point scale ranging from 1 (strongly disagree) to 5 (strongly agree). Item scores are summed, with higher scores indicating more intense and frequent (i.e., less bearable) mental pain. • Tolerance for Mental Pain Scale (TMPS)
36
is a questionnaire (Q) to assess tolerance for mental pain. It consists of 10 items that assess negative and positive perspectives of mental pain: feeling unable to manage one’s mental pain (e.g., “I cannot get the pain out of my mind”) and perceiving that one’s pain will not endure (e.g., “I believe that my pain will go away”). Items are scored on a 5-point Likert-type scale ranging from 1 (not true) to 5 (very true). Higher scores on the manage subscale indicate reduced tolerance for mental pain, whereas higher scores on the endure subscale indicate stronger expectations that mental pain will resolve. • Beck’s Hopelessness Scale, BHS, is a questionnaire (Q)
37
that consists of 4 items that measure the sensation of hopelessness during the last week (e.g., “My future looks dark to me”) using a true/false response option. Item scores are summed, with higher scores representing more severe hopelessness. • Perceived burdensomeness and thwarted belongingness are measured using the Interpersonal Needs Questionnaire, INQ (Q).
38
It consists of 12 items that measure perceived burdensomeness (8 items; e.g., “These days, I feel like a burden on the people in my life”) and thwarted belonging (4 items; e.g., “These days, I feel disconnected from other people”), each rated on a 7-point Likert-type scale ranging from 1 (not true for me at all) to 7 (very true for me). Item scores on each subscale are summed, with higher scores indicating more severe perceived burdensomeness and thwarted belonging.
Almost all of the aforementioned instruments have been used with different thresholds in the knowledge-based model (we will specify it in Section Knowledge-based model) and all of the numeric ones in the data-inferred approaches (Section Inferred approaches).
This set of questionnaires and question items were answered by 197 adolescents.
Example of the tabular data for a subject. The set of answered questionnaires and question-items are described in Section Measuring instruments. For the Questionnaires, the total score is gathered, not the response for each of the items.
Description of the sample
The sample size that was used for this study was 165 participants. It is important to note that collecting data from high-risk population is challenging, and that the targeted group is both unique and often underrepresented in research, which makes this dataset valuable and insightful. In fact, the sample size is typical for a study of this kind,
39
even if in the last years, sample sizes have increased.
40
There is research showing that machine learning models require a minimum sample size of 20 participants when using cross-validation to ensure reliable estimates,
41
provided the feature-to-sample ratio does not exceed 1:10. With a feature set of 17 variables and a sample size of 165 participants, our study aligns with these recommendations, ensuring sufficient data diversity for statistical analyses and internal validation. The inclusion criteria for the subjects are the following: • Adolescents aged 12–18 years. • Being in residential child care. • Having been at risk of child neglect. • Adolescents who were in a situation of emotional crisis at the time of the assessment, given that this condition could affect the validity of the responses to the questionnaire. • Adolescents with cognitive or developmental disabilities that made it difficult to understand the questionnaires used in the study. • Participants with incomplete data in the questionnaires.
On the other hand, the exclusion criteria are these:
A quick analysis of the sample of adolescents whose ages are comprised around 12−18 shows that more than a 50% of the individuals show suicide desire. Most of the group also performed self-harm. The group was fairly balanced in terms of gender, and the age mean was comprised between 15 and 16. The demographics of the sample group are shown in Table 2.
Descriptive statistics for demographic characteristics.
Data collection process
We began the recruitment process by contacting child protection services across several provinces in northern Spain. The research objectives were explained, and these services were invited to participate. Those who agreed and granted approval for the study contacted the managers of youth residential care units in their area, informing them about the research and requesting their collaboration. Managers who agreed were then directly contacted by the research team to provide further details and coordinate the implementation of the study. Informed consent was obtained from all participants prior to any data collection.
Adolescents who met the inclusion criterion and agreed to participate completed the instruments individually in a private room within their residential care unit. Data collection was conducted electronically via a secure online platform, with each participant accessing the questionnaire on a computer. Research has demonstrated that online data collection is as reliable as face-to-face methods for both normative and clinical populations. Furthermore, online questionnaires are particularly advantageous for assessing stigmatized behaviors such as suicide and self-harm, as they reduce social desirability bias that can influence responses in face-to-face or group settings. 42 Although existing evidence indicates that asking young people about suicide does not increase their risk of suicidal ideation or behavior, 42 we implemented additional safeguards to ensure participants’ emotional well-being. Specifically, a staff member from the residential care unit was available during and after the completion of the questionnaire to provide emotional support if needed.
Missing data
Regarding missing data, any incomplete responses were excluded from the data wrangling and analysis phases. This approach ensured the reliability of the dataset and minimized the potential for biases arising from imputation or incomplete data.
32 out of the original 197 instances missed some of the 19 data-points described in the previous section. We only considered the fully completed sets of questionnaires. Out of the 165 instances, 83 of them have been annotated as at low risk of suicidal behavior (50.3%), 43 of them were categorized as moderate risk (26.1%), and the remaining 39 (23.6%) were labelled as high risk (see Figure 1). Suicidal risk level class distribution (high, moderate or low) annotated by experts in the analysed sample.
The psychological characteristics of the three types of suicidal behavior risk levels are the following: • • •
Each case in the sample was studied by clinicians to determine a suicidal risk level.
Methodology
Being the questionnaire the central instrument employed in this work, section Questionnaire delves into critical aspects of questionnaire quality assessment. Next, in section Knowledge-based model, clinician expertise was implemented as a knowledge-based model. Finally, in section Inferred approaches, with the data collected, we explored machine learning based simple approaches. In this case, alternative models are inferred automatically from data without expert-knowledge.
Questionnaire
The questionnaire proposed by expert clinicians is formed by typical tools when assessing suicide risk, and, in this study, they have been selected and ordered in a flow diagram to classify each user response in the previously mentioned risk levels. This tool was assessed to determine the relation between each question and the expected outcome, i.e. the suicide risk. This would give us a practical idea of the ability of each question to convey information about the target class, that is, the predictive ability of each item in the questionnaire. Moreover, the correlation between items in the questionnaire were assessed. While question-to-risk correlation is desirable, question-to-question correlation would reveal redundancies in the questionnaire. In brief, the design of the questionnaire should include a minimal set of relevant though non-redundant questions.
In order to assess the questionnaire itself, two quantitative perspectives were explored: on the one hand, the Pearson correlation and, on the other hand, Mutual Information. Particularly, for Mutual Information calculations, entropy-based information gain was used as a key measure to evaluate the information contribution of each questionnaire feature regarding suicide risk. Entropy quantifies the uncertainty or disorder within the data, and information gain measures the reduction in this uncertainty when the dataset is split based on a given feature. Features with higher information gain are considered more informative, as they more effectively reduce uncertainty about the target variable.
Knowledge-based model
The knowledge-based model conveys an expert system developed by two experts in the area, a psychologist and a psychiatrist, both specialized in suicide. In addition, modern explanatory theories of suicide risk such as the fluid vulnerability theory, 43 the interpersonal theory of suicide 44 or the three-step theory were taken into account. 45
Thresholds for interpreting the concepts measured in the questionnaires (Q) as positive. For “tolerance for mental pain” and “thwarted belongingness,” higher scores indicate a lower associated risk. Note that the ”hopelessness” scale is a 4-item true/false questionnaire, with a score of 2 or more considered positive.
In our work, this knowledge has been encoded with the help of computer scientists in an algorithm that is shown in diagrammatic representations in Figures 2(a) and 2(b), respectively, for high and moderate suicidal risk. Based on the conditions for both moderate and high risk, if none of these were met, the patient was deemed low risk. Diagrammatic description of the knowledge-based approach model.
We observed that not all the measuring instruments described in Section Measuring instruments were used in these algorithms: based on their prior knowledge, experts have not considered ‘number of previous suicide attempts’, ‘identification with close suicide’, and ‘suicide desire’. The questionnaire about ‘perceived burdensomeness’ was not used either, as it is closely related to ‘thwarted belongingness’. The adolescent identifier, the age and the gender were not considered in this approach.
Inferred approaches
With the goal of (i) comparing the results to the knowledge-based method, and (ii) trying to reduce the number of measuring instruments, we use some machine learning-based inferred approaches to analyse the tabular data described in Section Measuring instruments. All the numeric data described in Table 1 has been used in the inferred approaches and, as it is usual in this area, the term feature is going to be used to describe each all the personal descriptors (‘age’, ‘gender’), individual question-items (I) or questionnaire results (Q). The machine learning models perform a classification task using this numerical data, assigning a suicide risk level to each sample by predicting a number between 0 (low risk) and 2 (high risk).
Machine learning models allow us to measure how much each of the features contributes to infer the correct class (high, moderate, low) annotated by the experts in the Gold Standard. That is, these techniques help to interpret whether the features have redundant information, or whether some of them are not important to reach the correct answer-type, and, as a consequence, could be removed from the set of questionnaires. Feature ablation helps quantify the significance of each feature by observing changes in predictive abilities. 24 This technique is crucial for enhancing model accuracy and interpretability by identifying which features contribute most significantly to predictions. Studies show that feature ablation can lead to better model performance. 46 This technique also estimates feature relevance and enables feature selection, as progressive ablation methods can refine feature sets without significant loss in accuracy. 47
To assess the impact of feature reduction on model performance, a statistical validation was performed using the Student’s t-test.
48
This analysis compared the performance metric F1-score of the Machine Learning models with all features against those of models with reduced feature sets (12, 8, and 1 features). The null hypothesis (H0) assumes that the means of the performance distributions are equal between the full-feature model and each reduced-feature model. The alternative hypotheses are defined as follows: H1 (the means are unequal),
These are the machine learning approaches used to perform the experiments for inferring the suicide ideation levels: • Decision tree • Random Forest (RF) • Extra trees • Boosted classifier • Linear regression • Logistic regression • Support Vector Machine (SVM) • Naïve Bayes • Neural network
Given that the sample is of small size, with a population of 165 adolescents, the inferred methods were assessed by means of Leave-One-Out Cross-Validation
49
in an attempt to avoid evaluation biases. This technique ensures maximized training data, with benefits also in robustness against overfitting, as the models are tested against diverse data points.
50
Moreover, to further reduce overfitting, we deliberately employed simpler machine learning models, an approach supported by 51 indicating that for tabular data, simpler models, and particularly Tree-based models, not only require minimal hyperparameter tuning but often perform comparably or better than complex models, while offering greater interpretability. In line with this, we also used a very simple feedforward neural network architecture with minimal hyperparameter tuning.
Experimental results
Questionnaire assessment
An analysis of the elements in the questionnaire was carried out in an attempt to rank the relevance of the features described in Section Measuring instruments when it comes to suicide risk deduction and, in the same way, to seize whether feature-pairs conveyed redundant information. Pearson correlation is shown in Figure 3. Pearson correlation matrix of the features and the suicidal risk level (gold standard).
From the correlation matrix, we found that ‘gender’ and ‘age’ features were the least correlated with risk. Moreover, these features also had low variance and cardinality, indicating lack of information. Therefore, these features, which are not used in the knowledge-based approach, are expected to have little importance for the machine learning approach. By contrast, elements from the questionnaire with the highest correlation with respect to suicidal risk are the features ‘SCS-R’, ‘mental pain’ and ‘previous suicidal attempt’. Thus, these features are expected to be relevant in the machine learning models.
Regarding redundant information, it was found that ‘suicidal desire’, ‘suicide ideation’, ‘suicide plan’ and ‘previous suicidal attempt’ were greatly correlated among themselves. The features ‘suicides of people near’ and ‘identification with close suicide’ were also highly correlated.
In parallel, Mutual Information was assessed to evaluate the relevance of each feature. Specifically, the information gain of each feature was calculated by measuring the reduction in entropy when the feature is known, reflecting how much uncertainty about the target variable is decreased. This measure provides a ranking of feature importance within the dataset. Therefore, features that produce a larger decrease in entropy compared to the original entropy (without any feature conditioning) are considered more informative and impactful for the prediction task.
Importance levels of the features based on class entropy information gain.
In this analysis, the most prevalent feature resulted in ‘suicidal belief system’ (SCS-R survey). This is also the feature most correlated with the gold standard, as can be verified in Figure 3.
On the other hand, the features ‘suicide desire’ and ‘suicide ideation’, even if they are highly correlated with the gold standard and with ‘suicide plan’, do not contribute significantly to the decision-making, which may be due to the redundant information they provide.
Risk prediction
In this section, we gather the experimental results in terms of predictive ability by the two alternative approaches presented, i.e. Knowledge based (section Knowledge-based model) and Inferred (section Inferred approaches) models.
Performance of inferred models given a different number of features, assessed in terms of macro-averaged F1-score obtained by means of leave one out cross-validation. The best-performing model for each feature quantity has been bold-faced.
Table 5 reveals that tree-based approaches attained superior predictive ability for all the scenarios, except for the scenario in which just a single feature is given to make the prediction. Random forest resulted one of the best performing inferred approaches in terms of F1-score and just required 8 elements to make the prediction. Note that, even though, intuitively, we might have expected that the more questions the better the predictive ability, we found that the models attained best performance with a subset of features selected according to the information gain, as shown in section Questionnaire assessment. It seems as if redundant or non-relevant questions would be detrimental to the inference algorithm. This can be further verified with a statistical analysis, as explained in the Methodology.
The analysis reveals that reducing the feature set to 12 features does not significantly affect the performance metrics. However, for a reduction to 8 features, an improvement over the complete feature set is obtained, as confirmed by statistical significance tests. Assuming statistical significance at p < 0.05 with the Student’s t-test,
48
comparing the complete set to the 8-feature subset under the alternative hypothesis
Conversely, the case of using only 1 feature shows statistically worse performance compared to other subsets, with a p = 0.001 for the
These findings suggest that an 8-feature subset strikes an optimal balance between model simplicity and performance. Next, we compared the inferred approaches with the Knowledge-Based (KB) model. The predictions made by this model were contrasted with the expected outcome and summarized in terms of confusion matrices in Figure 4(a) and compared to one of the inferred approaches, i.e. Random Forest (RF) with 8 features in Figure 4(b). Together with the confusion matrix, the F1-score was provided for cohesion with Table 5. Confusion matrices for the knowledge-based and the RF inferred approach. Notation: 0 = low risk, 1 = moderate risk, 2 = high risk.
A shallow inspection of matrices in Figure 4 reveal that the KB predicted a lot of moderate and high risk cases as no risk, while this discrepancy occurs less with the RF approach. The F1-score provides us with an overall view gathering both precision and recall (reflected in the confusion matrices) and resulted in 52 and 85, respectively, for KB and RF, revealing the high difference between both approaches.
Discussion
The aim of this study was twofold: first, the performance of models based on knowledge-based algorithms was compared with models inferred from machine learning data. We also sought to reduce the set of assessment instruments (questionnaires) by identifying redundancies and non-relevant items, maintaining or improving the predictive capacity of the model, and proposed a simplified model for suicide risk detection with a reduced set of features that allows for efficient and high-performance assessment.
Regarding the first aim, results showed that data-inferred approaches, such as machine learning models, consistently outperformed knowledge-based models in terms of predictive capacity. In particular, the Random Forest-based model achieved an average macro F1-score of 85% using a reduced feature set, while the knowledge-based model achieved an average F1-score of only 52%. This significant difference highlights several important issues. On the one hand, the machine learning models were more effective at correctly classifying risk levels (low, moderate and high), especially in the moderate and high risk cases. This can be seen in Figure 4, where knowledge-based and inferred approaches are directly compared.
These findings are consistent with previous research with adolescent populations that has highlighted the capacity of machine learning to capture complex patterns in data and improve predictive accuracy in clinical settings, particularly in vulnerable populations such as adolescents, where traditional models may be limited due to the non-linear nature of suicide risk factors. 53
Among the selected variables, mental pain and previous suicide attempts were strongly correlated with the level of risk, reflecting their relevance in identifying high-risk adolescents. This finding is consistent with previous research highlighting the role of mental pain as a central marker in the conceptualisation of suicidal risk in at-risk adolescents. 54 The inclusion of this variable in a simplified model provides a more nuanced perspective tailored to the emotional realities faced by young people in vulnerable environments. There have been numerous recent attempts to approximate adolescent suicide risk through machine learning, either by drawing on data associated with prior suicidal ideation and behaviour 53 or through indirect information associated with risk factors. 21 A prominent feature of these investigations has been the large number of variables used. While these approaches have shown promising results in terms of predictive accuracy, their application in clinical or psychosocial settings poses significant challenges. The time and effort required for participants to complete such a large number of items can be problematic, especially in younger adolescents, whose capacity for concentration and attention tends to be more limited. In this regard, one of the main contributions of this study has been the development of a robust system capable of predicting the potential risk of suicidal behaviour using a reduced set of characteristics. The results in Table 5 show that, after assessing the relevance and redundancy of the features, a subset of only 8 features not only maintained, but in some cases improved their predictive performance, achieving an average macro F1-score of 85%. This facilitates the practical feasibility of their implementation in real-world environments, reducing the burden on both participants and practitioners administering these tools. Furthermore, this simplification responds to the practical and ethical needs of tailoring assessments to the specific context and characteristics of adolescents, as noted in recent research.21,55
Thus, the results indicate that it is possible to significantly reduce the number of features used in the model without compromising its performance. Furthermore, by assessing only one feature, the SCS-R questionnaire proved to have the highest predictive value among the variables analysed, which reinforces its importance as a central tool in suicide risk assessment. Importantly, although the predictive capacity of the SCS-R has been previously validated in adult populations, 56 its use in adolescents has been less explored. This study provides additional evidence for the efficacy of this instrument in juvenile populations, thus broadening its applicability and utility in this age group. In a context where adolescent-specific tools are limited, these findings reinforce the value of the SCS-R as a key instrument for assessing suicide risk in vulnerable adolescents.
In summary, this study demonstrates that machine learning approaches can outperform knowledge-based models in predicting adolescent suicide risk, especially by optimising the number of features employed. The combination of clinical relevance, redundancy reduction and robust performance reinforces the usefulness of inferred models in real-world contexts. Furthermore, the use of simplified and specific questionnaires, such as the SCS-R, provides a practical and effective tool to identify potential risk for suicidal behaviour, promoting more efficient and accessible assessments in clinical and educational settings. These findings open the door to future work to extend the generalisability of the models and their application in different populations and contexts.
Conclusion
This work started with a self-designed questionnaire addressed to adolescents and aimed at suicide risk detection. The responses of 165 adolescents were considered. The questionnaire itself was assessed quantitatively, in an attempt to detect redundant questions and also to seize the relevance of each feature. Two approaches have been applied to the questionnaire responses, a knowledge-based model and a machine learning-based inferred approach. Different conclusions can be drawn from different models, but the inferred approaches, based on machine learning, have been demonstrated to improve prediction significantly.
A great conclusion in this work is that the ‘SCS-R’ survey, determining the suicidal belief system, can be a great metric in order to detect suicide risk. Whilst other parameters seem to have little effect on the outcome, this survey can be used by itself to detect the risk up to a F1-score macro average of 76%. This is much more interpretable than the models containing up to 17 features, though at the cost of reduced performance.
Even if not only one questionnaire is used, the feature quantity can be significantly reduced to just 8 features with no negative impact on the model, and it can even improve the performance of the model. Thus, by performing feature ablation based on cross-entropy information gain, we have been able to identify the most important features for suicide risk prediction. This minimizes the need for redundant questions and potentially leads to shortened questionnaires that can be more easily distributed.
Regarding the limitations of this investigation, the relatively small size of the dataset, although common in these kinds of studies, may have contributed to overfitting and limited the model’s generalizability. We used Leave-One-Out Cross-Validation and simple Machine Learning methods to mitigate this risk. However, a larger and more diverse dataset would likely improve the robustness of our conclusions. The use of a single dataset for both training and testing may also limit external validity.
Therefore, we acknowledge that future research should incorporate larger and more diverse datasets to explore the potential benefits of more complex models, including Deep Learning techniques, to determine whether they yield meaningful improvements in predictive accuracy. Future studies will also aim to include secondary or external datasets to support independent validation, improve reproducibility, and generalize the findings beyond this preliminary investigation.
Other limitations of our work could be the lack of time assessment or clinical assessment tools, which have been found to be beneficial for the performance of the models. 12 A follow-up or monitoring of the patients along a time span would be interesting to pursue. Moreover, this study was conducted on a specific group of high-risk adolescents in residential care. We acknowledge that this may limit the generalisability of findings to adolescents outside this particular context. Therefore, it is important to note that the tools and models developed in this study have been designed specifically for this high-risk population and their applicability to other contexts or populations should be interpreted with caution. However, the methodological approach of using entropy to assess information gain and verifying this with machine learning approaches is, doubtlessly, generalizable. The age span of the participants (12-18) is also a limitation, as the risk factors may vary with age. However, with the inferred models, we found that the feature ‘Age’ did not have a significant impact on the prediction of suicide risk. Moreover, it has to be taken as a limitation that this tool has not been validated in another dataset.
Despite these limitations, our findings provide valuable insights into suicide risk assessment in vulnerable adolescents. Importantly, through our feature ablation analysis, we identified a reduced set of eight questionnaires, and a particularly informative single questionnaire, that effectively capture critical risk factors. This reduction in assessment complexity represents a meaningful step toward more practical and accessible suicide risk screening tools. These findings lay important groundwork for adapting and validating these streamlined tools across broader populations and diverse clinical settings in future research.
Footnotes
Acknowledgements
The authors would like to express their gratitude to the research team members and collaborators who contributed their time and expertise to this study. Special thanks are extended to Osakidetza, whose guidance and support were instrumental in shaping this work. The computational resources provided by HiTZ are gratefully acknowledged.
Ethical considerations
The study was approved by the Ethics Committee for Research on Human Beings of the University of the Basque Country (Ref.97/18). All the participants, and, if applicable, their legal representatives, gave written informed consent.
Author Contributions
The author contributions are highlighted using the CRediT taxonomy.
Conceptualization:
Data Curation:
Formal Analysis:
Software:
Supervision: Technical oversight was provided by
Writing – Original Draft Preparation:
Writing – Review & Editing:
All authors have read and approved the final manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially funded by LOTU with code TED2021-130398B-C22 funded by the MCIN/AEI /10.13039/501100011033 and by the European Union NextGenerationEU/ PRTR. Besides, this work was partially funded by the Spanish Ministry of Science, Innovation and Universities (EDHIA PID2022-136522OB-C22) and by the Basque Government (IXA IT-1570-22).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The dataset used in this study contains sensitive and confidential information from a vulnerable adolescent population in residential care and cannot be publicly shared due to ethical and privacy considerations. Access to the raw data is restricted to protect participant confidentiality in compliance with ethical guidelines. The used questionnaires are described and referenced in the article. For reproducibility purposes, the source code used for data analysis, feature selection, and machine learning modelling has been made publicly available in
.
