Predictive Machine Learning Models for Assessing Lebanese University Students’ Depression,Anxiety,and Stress During COVID-19

Abstract

University students are experiencing a mental health crisis. COVID-19 has exacerbated this situation. We have surveyed students in 2 universities in Lebanon to gauge their mental health challenges. We have constructed a machine learning (ML) approach to predict symptoms of depression, anxiety, and stress based on demographics and self-rated health measures. Our approach involved developing 8 ML predictive models, including Logistic Regression (LR), multi-layer perceptron (MLP) neural network, support vector machine (SVM), random forest (RF) and XGBoost, AdaBoost, Naïve Bayes (NB), and K-Nearest neighbors (KNN). Following their construction, we compared their respective performances. Our evaluation shows that RF (AUC = 78.27%), NB (AUC = 76.37%), and AdaBoost (AUC = 72.96%) have provided the highest-performing AUC scores for depression, anxiety, and stress, respectively. Self-rated health is found to be the top feature in predicting depression, while age was the top feature in predicting anxiety and stress, followed by self-rated health. Future work will focus on using data augmentation approaches and extending to multi-class anxiety predictions.

Keywords

machine learning mental health depression anxiety stress university students

Introduction

The COVID-19 pandemic¹ had a drastic effect on people’s mental health across the globe,² mainly due to the impact of quarantine procedures (eg, suspension of many activities and social isolation). The long duration of associated measures had a negative effect on mental health conditions, including depression, anxiety, and stress.^3,4 The impact was especially recognized among the young population, including students, and is a cause of concern.⁵ Students’ distress impacts their lives and academic achievement, which is associated with societal impact in terms of loss in productivity and economic losses.⁶ Several studies uncovered increased levels of anxiety, depression, and stress around the globe.^7
-13

A multicounty cross-sectional survey conducted in Pakistan, China, India, Indonesia, Saudi Arabia, Malaysia, and Bangladesh showed 35.6% mild to severe anxiety.¹⁴ Another study in the United States reported that 48.14% of university students had moderate-to-severe depression, 38.48% had moderate-to-severe anxiety, and 71.26% reported that their stress levels had increased during the pandemic.¹⁵ Research on university students from Bangladesh, Egypt, Ethiopia, Lebanon, Turkey, and Brazil reported similar findings related to depression, anxiety, and stress; symptoms of depression levels varied between 21.2% in Ethiopia and 82.4% in Bangladesh, symptoms of anxiety varied between 27.7% in Ethiopia and 87.7% in Bangladesh, and symptoms of stress between 12.7% in Lebanon to 57.5% in Brazil.^16
-20 A systematic review and meta-analysis found that the prevalence of anxiety, depression, and stress among college students during the COVID-19 pandemic was 29%, 37%, and 23%, respectively.⁷

The negative impact of the COVID-19 pandemic on mental health in Lebanon was demonstrated in the tertiary referral hospital population,²¹ healthcare workers,^22,23 refugees,²⁴ the general population,^25,26 and the young population (18-35 years).²⁷ A study conducted at the onset of the pandemic in April 2020 reported that 17.9%, 13.8%, and 1.7% of students exhibited mild, moderate, and severe depressive symptoms, respectively; also, mild, moderate, severe, and extremely severe anxiety symptoms were found in 3.3%, 21.9%, 6.3%, and 2.3%, respectively; and 11% of students reported mild stress, while 1.7% reported moderate stress.

To document, understand, and plan for appropriate mental health programming for students after 2 years of the pandemic, we conducted a cross-sectional survey in Lebanon between November 2022 and February 2022 using an online survey among university students. We have then developed multiple machine learning models to predict the level of depression, anxiety, and stress among university students; those models would be used by students’ counseling services to plan appropriate interventions. While machine learning models have been used to predict the effectiveness of an intervention on depression^28,29 or change of anxiety levels³⁰ or stress^31,32 in different populations, studies are focused on prevention or treatment instead of prediction. To our knowledge, this is the first time a machine learning model is used to predict the existing levels of depression, anxiety, and stress among university students, and based on standard socioeconomic status (SES), lifestyle, and education-related data without access to health-related ones (eg, blood pressure and heart rate). Developing predictive models will enable early detection of symptoms and, hence, early intervention, recognized as essential for mental health and symptom management.^33,34 Besides, machine learning models have the ability to recognize the most important factors influencing prediction; such recognition will allow universities to tailor their engagement with students on mental health to these factors through their programming counseling services. Our guiding research questions are (1) what is the state of depression, anxiety, and stress in the Lebanese university student population, and (2) could we build machine learning predictive models for depression, anxiety, and stress based on sociodemographic and lifestyle data?

The remainder of this paper is organized as follows. Section II presents the methodology, from data collection to ML models. Section III presents the results of the ML models and their optimization. Section IV discusses the models’ results and the approach’s limitations, while Section V concludes the paper.

Methods

Data Collection

A cross-sectional survey was conducted in Lebanon using an online survey distributed to undergraduate and graduate students. Our study targeted students from 2 different universities in Lebanon to ensure a comprehensive representation of the student body. The American University of Beirut (AUB), a prestigious private university, and the Lebanese University (LU), Lebanon’s only public university, were chosen. This careful selection was made to cover a wide socioeconomic range. The Lebanese University serves a diverse student body, including students from low-income families and rural areas. The American University of Beirut, on the other hand, primarily serves a more affluent student demographic. The intentional choice of these universities significantly amplifies the study’s capability to capture a wide array of perspectives and socioeconomic factors that could influence mental health levels among university students in Lebanon.

An online survey was disseminated to undergraduate and graduate university students in Lebanon in both Arabic and English languages. The students were provided a detailed study description and a link to the survey through electronic platforms like WhatsApp and email. Two reminders were sent to the participants within a 2-week interval to maximize participation and response rates. The survey started with a consent form that provided the students with relevant information about their rights and responsibilities and guaranteed the confidentiality of their information. On average, the survey took students approximately 15 to 20 min to complete.

The data was collected between November 2021 and February 2022, when the Omicron variant was first identified and started spreading. The study participants were 329 students who were 18 years old or above, enrolled in either Spring 2020 to 2021 or Fall 2021 to 2022 at either the American University of Beirut (a private institution) or the Lebanese University (a public institution).

The participants provided their written informed consent online before completing the survey. Considering the evolving nature of the pandemic and to minimize the spread of the virus, an online convenience sampling strategy was adopted. This sampling approach has been commonly employed in numerous COVID-19-related studies.^35
-37

The individuals involved in the study did not receive any financial rewards, and their identities were kept confidential to ensure the reliability and privacy of the collected data. This study was carried out per the Declaration of Helsinki’s guidelines for human subjects research. Ethics approval for the study was obtained from the Institutional Review Board at the American University of Beirut (SBS-2021-0256) and the Research Ethics Board at York University in Canada (Certificate # e2021-327).

Features: Sociodemographic and Lifestyle Practices

The measured sociodemographic factors are age, gender, income, current program, nationality, relationship status, and number of people living in the household. Lifestyle practices include cigarette and shisha smoking, alcohol intake, physical activity, sleeping patterns, internet usage, and overall health. Participants were also asked if they had sought private counseling or therapy from a clinical mental health professional, tried mindfulness meditation, followed COVID-19 preventive measures (wearing masks, handwashing, quarantining, etc.), received the COVID-19 vaccine, and if they had kept up with COVID-19 updates. Finally, participants were asked if they had COVID-19 infection, believed that Coronavirus and vaccination were the subjects of a conspiracy, and if religion is important in their daily lives. The feature list can be found in Table 1.

Table 1.

Machine Learning Model’s Features.

Feature description	Type
Age	Numeric
Gender	Categorical
In a relationship	Binary (yes/no)
Level of studies	Categorical (eg, undergraduate or graduate)
GPA after covid	Categorical (decreased, unchanged, or increased)
Income level above the minimum wage	Binary (yes/no)
Overall self-rated Health	Categorical (poor, fair, good, very good, or excellent)
Religion important in daily life	Binary (yes/no)
There is a conspiracy behind COVID-19	Categorical (disapprove, approve, or neither)
Adhere to COVID-19 measures	Binary (yes/no)
Infected by COVID-19 virus	Binary (yes/no)
Have access to private counseling	Binary (yes/no)
Level of adherence to a healthy diet	Binary (low/high)
Change in smoking cigarettes	Categorical (no change, reduced, increased)
Change in smoking Shisha (ie, hubbly bubbly)	Categorical (no change, reduced, increased)
Change in Alcohol consumption	Categorical (no change, reduced, increased)
Change in physical activity	Categorical (no change, reduced, increased)
Sleep hours in a day	Categorical (<7, 7-9, or >9)

Target Outcome Variables

Depression (PHQ-9)

The depression data were collected using the Patient Health Questionnaire (PHQ-9) questionnaire.³⁸ PHQ 9 is a brief 9-items. Each item is assessed for the prior 2 weeks: 0 = “not at all,” 1 = “several days,” 2 = “more than half the days,” and 3 = “nearly every day,” with a total score ranging from 0 to 27. A score of 0 to 4 indicates a minimum depression; 5 to 9 mild depression; 10 to 14 moderate depression; 15 to 19 moderately severe depression; 20 to 27 severe depression.³⁸ Participants with a score of 10 or above were assigned to the Possible Major Depressive Disorder (MDD) group, while those with a score of 9 or less were assigned to the Non-MDD group.³⁸ With a sensitivity of 80% and specificity of 92%, a total score of 10 or above indicated the possibility of serious depression.^39,40 Additionally, PHQ-9 is a self-rating scale with strong reliability and validity for students.^41,42 The Cronbach’s alpha coefficient of the PHQ-9 was .901 in our study.

Anxiety (Beck Anxiety Inventory (BAI))

Anxiety data was collected using the Beck Anxiety Inventory (BAI) questionnaire.^43,44 BAI is a 21-item questionnaire that measures anxiety symptoms. Participants must rate themselves on a 0 to 3 scale, with zero indicating “Not at all” and 3 indicating “Severely-It bothered me a lot,” with a maximum score of 63 and a minimum score of zero. Minimal anxiety is a score of 0 to 7, mild anxiety 8 to 15, moderate anxiety 16 to 25, and severe anxiety 26 to 63.⁴⁵ A score of 16 is considered the clinical cut-off for anxiety.⁴⁶ BAI questionnaire demonstrated high internal consistency and acceptable reliability.⁴⁷ In our study, the Cronbach’s alpha coefficient of the BAI scale was .944.

Stress (Perceived Stress Scale (PSS))

Stress data was collected using the 10-item Perceived Stress Scale (PSS) questionnaire.⁴⁸ Participants must rate themselves on a 5-point Likert scale from 0 = never to 4 = very often. PSS-10 scores were obtained by reversing the scores on the 4 positive items; the items were 4, 5, 7, and 8. Total scores vary from 0 to 40, with 0 to 13 indicating mild stress, 14 to 26 indicating moderate stress, and 27 to 40 indicating high stress.

High perceived stress was defined as a score of 27 or above. This cut-off point has been used in a previous study.⁴⁹ PSS has been proven reliable and valid in various settings and languages.^50
-53 The Cronbach’s alpha coefficient of the PSS-10 scale was .846 in this study.

Machine Learning Algorithms

In this study, 8 ML predictive models including Multi-Layer Perceptron (MLP), Logistic Regression (LR), K Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest (RF), Ada Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost), and Naïve Bayes (NB) were built, and their performance was compared. The performance measurement of choice was the area under the curve (AUC) of the receiver operating characteristics (ROC). Each model has predicted depression, anxiety, and stress levels based on the list of above features. The dataset was divided into a 70:30 ratio for training and testing; a cross-validation approach with grid search was used for parameters’ optimization. The optimized models were used to predict depression, anxiety, and stress levels. The ML algorithms are described below.

Logistic Regression (LR)

is a linear method that models the probability of an outcome taking place by calculating the log odds of the event given a combination of independent features. It is used in situations where the outcome is binary.⁵² Its linearity makes it easier to implement, interpret, and explain than more complex models. Using Maximum Likelihood Estimation (MLE), LR can use the coefficients to predict the probability of an observation belonging to a class; in our case, we have 3 classes (depression, anxiety, and stress).

K Nearest Neighbors (KNN)

It is an algorithm that starts with K instances of the datasets around a data instance and assigns the most frequent label to the instance.⁵⁴ KNN can adjust itself to various data shapes and complexities as it heavily relies on distance computations (eg, Euclidean distance) and, thus, the training data without assuming a parametric model.

Support Vector Machine (SVM)

It is an algorithm that creates the optimal hyperplane that divides a dataset into 2 or more classes. The optimal hyperplane is at the maximum distance from the classes’ nearest data points.⁵² SVM is deemed fit for prediction given its robustness and use of kernel trick, allowing it to handle non-linear decision boundaries.

Random Forest is an ensemble technique that builds multiple decision trees and merges them to obtain a more accurate and stable prediction. It selects a random sample with replacement from the dataset and creates a corresponding model. At each split, it selects a random subset of features to do the splitting, making it less likely to overfit.⁵⁵ Its ability to model non-linear interactions between the features and the target variable makes it a good fit for this case.

AdaBoost

It is an algorithm that belongs to ensemble learning that builds a strong learner out of a combination of weak learners, such as a decision stump (ie, a decision tree with 1 level). It focuses on the training instances that the predecessor algorithm misclassified.⁵⁴ Given our small dataset, AdaBoost tends to resist overfitting in such cases while providing insights into feature importance.

Extreme Gradient Boosting (XGBoost)

XGBoost uses decision trees as base learners and gradient boosting as a combination method (ie, Newton boosting). XGBoost is more efficient than decision trees and usually provides better prediction accuracy.⁵⁶ XGBoost has built-in L1 (Lasso) and L2 (Ridge) regularization, which prevents overfitting, especially when the dataset is small. XGBoost uses the gradient boosting algorithm to iteratively add weak learners, typically trees, to the model, where each tree corrects the errors of its predecessor.

Naïve Bayes

It is an algorithm that computes the conditional probability that a data instance belongs to a class, knowing the class characteristics. The instance would be assigned to a class with the highest conditional probability.^54,57 Although it has a high bias, it is advantageous when the dataset is small. It might not capture all the intricacies of the data, but it makes it less prone to overfitting than complex models and can easily handle scenarios with more than 2 classes.

Multi-layer perceptron (MLP)

A multilayer perceptron is a deep learning approach that learns dependencies between the input layer (the features or variables) and the output layer (the classification decision). Between the input and the output layers, there can be 1 or more hidden layers but as many neurons as needed. The neurons are weighted and connected with nonlinear functions. The MLP uses a backpropagation algorithm to update the weights within the hidden layers to minimize the output layer’s error rate.^58,59

Table 2 compares the advantages and disadvantages of the above predictive machine learning models.

Table 2.

The Advantages and Disadvantages of the Above Predictive Machine Learning Models.

Model	Advantages	Disadvantages
Multi-layer perceptron (MLP)	• Model complex, non-linear relationships • Good for large datasets • Flexible in architecture design.	• Requires tuning of parameters (like the number of layers and neurons) • Computationally intensive • Can overfit on small datasets.
Logistic regression (LR)	• Simple to implement and interpret • Good for binary classification problems.	• Assumes linear decision boundaries • Not fit for complex relationships • Can struggle with high-dimensional data.
K nearest neighbors (KNN)	• Simple and effective • No assumption about data distribution • Good for multi-class problems.	• Computationally intensive with large datasets • Sensitive to irrelevant features.
Support vector machine (SVM)	• Effective in high-dimensional spaces • Works well with a clear margin of separation • Effective when the number of dimensions is greater than the samples.	• Not suitable for large datasets • Less effective with noisy data • Requires careful choice of kernel and parameters.
Random forest (RF)	• Handles overfitting better than decision trees • Good performance on many problems • Can handle large datasets and high dimensional data.	• Less interpretable • Can be computationally intensive • Longer training time.
Ada boosting (AdaBoost)	• Improves classification accuracy • Combines multiple weak learners to create a strong learner • Less prone to overfitting.	• Sensitive to noisy data and outliers • Can be slower to train due to the sequential learning process.
eXtreme gradient boosting (XGBoost)	• High performance and fast execution speed • Handles missing data • Effective with a variety of data types.	• Can overfit on small datasets • Requires careful tuning of parameters • More complex to understand and implement.
Naïve Bayes (NB)	• Simple and easy to implement • Good for large datasets • Fast training and prediction times.	• Assumes independence between predictors • May not work well with numerical features • Can be outperformed by more complex models.

Hyperparameter Tuning

Hyperparameter tuning is the process of optimizing the hyperparameters of a machine learning algorithm to improve its performance. The grid search was performed on all the models to get the best hyperparameters using roc_auc value as a scoring metric. Those hyperparameters are set before the training process begins and determine different aspects of it. Overfitting has been tackled using approaches appropriate for each algorithm (eg, regularization and tree depth reduction).

Data Analysis

The dataset that consisted of 329 records was cleaned; missing values were replaced with the mode of the feature. Each ML algorithm was used to predict the students’ mental health, depression, anxiety, and stress symptoms. Google Collaboratory was used for training, optimizing, and testing the ML models.

While AUC was the main performance measurement, other measures were computed, including sensitivity, specificity, precision, F-measure, and accuracy.⁵⁴ The equations to compute the performance measure are as follows:

\begin{array}{l} Recall (Sensitivity) = True Positive / (True Positive + False Negative) \\ Precision = True Positive / (True Positive + False Positive) \\ F - measure = (2 \times Precision \times Recall) / (Precision + Recall) \\ Accuracy = (True Positive + True Negative) / \\ (True Positive + False Positive + True Negative + False Negative) \end{array}

Results

Implementation Procedure

Implementation is done using Python (3.7.13) and the Scikit-learn library (1.0.2). The CSV file is read into a data frame. For categorical values, imputation is performed by replacing the null values with the most frequent values. Each target variable has a corresponding data frame with 18 predictive features.

The obtained dataset has been split into training and test data using Stratified Shuffle Split, 70% for training and 30% for testing.

Sociodemographic Characteristics

Table 3 summarizes the descriptive statistics for the study participants’ characteristics. The participants’ mean (SD) age was 24.99 (7.39) years. The majority of participants were females (63.8%). Students were enrolled in various university programs, with undergraduate students accounting for 43% of the sample. More than two-thirds (77.5%) of participants had a monthly household income of 450 USD or less. Approximately 60% of students considered their overall health good, very good, or excellent. Sixty-four percent of the respondents stated that religion is important in their daily lives. Coronavirus and vaccination were the subjects of a conspiracy, according to 14% of participants. Furthermore, the majority of students (73.6%) followed COVID-19 prevention guidelines, and about a quarter of them were infected with COVID-19. Private counseling was received by more than half of the students (57.4%).

Table 3.

Sociodemographic and Other Characteristics of University Students (N = 329).

	n (%)
Age (mean (SD)	24.99 (7.39)
Gender
Men	77 (23.4)
Women	210 (63.8)
Missing	42 (12.8)
Relationship status
Not in a relationship	172 (52.3)
In a relationship	118 (35.9)
Missing	39 (11.9)
University program
Undergraduate degree	143 (43.5)
Certificate program	14 (4.3)
Graduate program (MA or MSc)	141 (42.9)
PhD program	12 (3.6)
MD program	19 (5.8)
Missing
GPA status during the pandemic
No change	103 (31.3)
Decreased	103 (31.3)
Increased	123 (37.4)
Household income (in USD)
≤450	255 (77.5)
>450	74 (22.5)
Overall rated health
Poor	29 (8.8)
Fair	100 (30.4)
Good	121 (36.8)
Very good	66 (20.1)
Excellent	13 (4.0)
Importance of religion in daily decisions
Not important	70 (21.3)
Important	213 (64.7)
Missing	46 (14.0)
Conspiracy behind COVID virus/vaccine
Disapprove	117 (35.6)
Neither approve nor disapprove	117 (35.6)
Approve	49 (14.9)
Missing	46 (14.0)
Adherence to COVID-19 preventive measures
No	41 (12.5)
Yes	242 (73.6)
Missing	46 (14.0)
Infected with COVID-19
No	197 (59.9)
Yes	86 (26.1)
Missing	46 (14.0)
Private counseling
No	140 (42.6)
Yes	189 (57.4)
Depression (N mean (SD))	32910.18 (6.83)
Anxiety (N mean (SD))	32418.81 (14.42)
Stress (N mean (SD))	32621.97 (7.30)

Mental Health Outcomes

The mental health outcome analysis showed that the mean (SD) score for depression was 10.18 (6.83), anxiety was 18.81 (14.42), and stress was 21.97 (7.30). Figure 1 depicts the study participants’ levels of depression, anxiety, and stress. Mild to moderate depression, anxiety, and stress were reported by the majority of participants (52.3%, 42.9%, and 61.7%, respectively), while severe depression, severe anxiety, and high stress were reported by 24.6%, 29.3%, and 27.6%, respectively. In total, students reported moderate to severe levels of depression, anxiety, and stress at a rate of 75.9%, 72.2%, and 89.3% respectively.

Figure 1.

Severity of mental health outcome.

Performance of the ML Models

The comparison between models’ performance rates used in predicting students’ depression, anxiety, and stress is shown in Table 4.

Table 4.

Models’ Performance Measurements.

Model	AUC (%)	Sensitivity (%)	Precision (%)	F1-score (%)
MLP
Depression	73.90	74.28	63.41	68.42
Anxiety	72.60	73.68	63.63	68.29
Stress	70.30	66.66	47.05	55.17
Logistic regression
Depression	74.12	55.00	78.00	64.00
Anxiety	74.89	74.00	75.00	74.00
Stress	66.51	7.00	100.00	13.00
AdaBoost
Depression	76.25	65.00	72.00	68.00
Anxiety	74.89	100.00	53.00	69.00
Stress	72.96	33.00	69.00	45.00
Random forest
Depression	78.27	60.00	75.00	67.00
Anxiety	69.93	62.00	70.00	66.00
Stress	72.42	12.00	75.00	19.00
XGBoost
Depression	75.55	51.00	79.00	68.00
Anxiety	67.67	71.00	65.00	68.00
Stress	66.87	30.00	72.00	42.00
SVM
Depression	74.36	67.00	65.00	66.00
Anxiety	74.94	72.00	72.00	72.00
Stress	72.37	3.00	100.00	7.00
Naïve Bayes
Depression	74.12	62.00	71.00	65.00
Anxiety	76.37	77.00	75.00	76.00
Stress	63.36	37.00	47.00	41.00
KNN
Depression	66.63	35.00	66.00	46.00
Anxiety	61.05	53.00	57.00	62.00
Stress	63.84	12.00	60.00	18.00

Abbreviations: AUC, area under curve; F1-score, harmonic mean between precision and recall.

Depression

The AUC value for Random Forest at 78.27%, AdaBoost at 76.25%, XGBoost at 75.55%, Support Vector Machine at 74.36%, Logistic regression and Naïve Bayes at 74.12 %, MLP at 73.90%, and KNN at 66.63%.

Anxiety

The AUC value for Naïve Bayes at 76.37%, Support Vector Machine at 74.94%, AdaBoost and Logistic regression at 74.89 %, MLP at 72.60%, Random Forest at 69.93%, and XGBoost at 67.67%, and KNN at 61.05%.

Stress

The AUC value for AdaBoost at 72.96%., followed by Support Vector Machine at 72.36%, Random Forest at 72.42%, MLP at 70.30%, XGBoost at 66.87%, Logistic regression at 66.51 %, KNN 63.84% and Naïve Bayes at 63.36%.

In addition, we have performed a feature selection using the Random Forest feature importance ranking method (Table 5). For anxiety, self-rated health was the top-ranked feature (100% importance), followed by age (64%); the remaining features were below 30%.

Table 5.

Feature Importance for Depression, Anxiety, and Stress.

Feature	Depression (%)	Anxiety (%)	Stress (%)
Age	100%	64%	100%
Overall self-rated Health (poor, fair, good, very good, excellent)	89	100	70
Sleeping hours during the pandemic (<7, 7-9, and >9)	36	—	—
Change in physical activity (no change, reduced, and increased)	31	—	32

For depression and stress, age was found to be the most important feature to predict depression and stress (100% for both), followed by self-rated health (89% for depression and 70% for anxiety). Sleeping hours during the pandemic and change in physical activity were the third and fourth most important features for depression (36% and 31%, respectively), and physical activity duration ranked fourth for depression at 31% and third most for stress (32%). The remaining features were below 30%.

Discussion

There is currently a scarcity of studies assessing the mental health of university students in Lebanon. This study aimed at understanding university students’ mental health, specifically depression, anxiety, and stress, during Lebanon’s extended COVID-19 pandemic based on the sociodemographic factors and lifestyle practices associated with it.

An AUC value between 70% and 80% is acceptable, while an excellent test would have an AUC value between 80% and 90%.⁶⁰ In our study, we aim to have a quasi-diagnostic model; in such cases, AUC is the best measurement for performance. Hence, no single model could be adopted as a single predictor for all 3 outcomes.

Random Forest achieved the best AUC at 78.27% for depression, Naïve Bayes at 76.37% for anxiety, and AdaBoost at 72.96% for stress.

Several studies reported predicting PHQ-9 based on smartphone data,⁶¹ gait abnormality,⁶² and surveys,⁶³ with different levels of success. Compared with the sole study that considered the prediction of PHQ-9 during COVID-19 (AUC = 96%),⁶³ the AUC of the random forest model in our study is significantly lower (78.27%). The difference could be attributed to the nature of the questionnaire items used in the previous study, which included questions about financial stress, whether the participant lost someone close to them, whether they have a conflict with family and friends, whether they faced any life-threatening events, whether they had any suicidal though, and whether they were physically, emotionally, or sexually abused. Such questions are more directly linked to one’s psychological condition than those in our study.

In the sole published study addressing anxiety within a cohort of 1172 university students in China, the Self-Rating Anxiety Scale³⁰ was employed for multiclassification using XGBoost. It is worth noting that this study used a distinct measurement scale and a multiclassification approach, in contrast to our binary approach. Furthermore, the study did not report the AUC in its findings.

Previous studies used machine learning to predict PSS⁶⁴ among 206 students in India before COVID-19 and did not report the AUC. The pre-COVID study reported the highest accuracy for an SVM model (85.71%), which is higher than our AdaBoost classifier (72.96%); however, the researchers did not report the survey questions, which makes it impossible to compare the results of our study with theirs; this is further complicated by the fact that the pre and post COVID-19 attitudes and experiences differ drastically.

Feature Importance Implications

In relation to anxiety, the most important factor was self-rated health (100%), followed closely by age (64%). Conversely, age and self-rated health were the most important predictors for depression (100% and 89%, respectively) and stress (100% and 64% respectively). Exploring the predictive capacity of these features independently or combined with a change in physical activity and sleeping hours on the models’ performance would be interesting in the future as it could lead to a robust predictive model using very few data items. Such a model could become an important tool to enhance universities’ engagement with students on mental health and programming counselling services. Machine learning holds significant potential in addressing mental health issues on university campuses.⁶⁵

Limitations of the Study

This study has several limitations. First, given the cross-sectional nature of the study design, the results are subject to confounding biases, such as the participants’ mental health status prior to the COVID-19 pandemic and other life stressors (eg, experiences of violence). Second, there is the possibility of selection bias as participation was voluntary. Third, the study relied on a convenience sample limited to students from 2 universities. While this sampling technique does not necessarily assure that results are generalizable, it can be a valuable tool for determining the likelihood of a potential relationship between the variables.^66,67 Lastly, like any research conducted in an unstable environment with insecurity and instability and constantly changing circumstances, predicting and isolating the impact of these life factors is nearly impossible.

Although we had a relatively limited number of respondents in our study, the MLP neural network has the advantage that it can be trained effectively on small datasets and produce favorable performance.^68,69 The model we developed showed promising performance in predicting the risk of depression, anxiety, and stress among university students, which can be helpful for university counselors in planning customized, scalable interventions such as e-mental health.

Recommendations

Model’s Performance Recommendations

In terms of the model’s performance, it is recommended that future studies develop an ensemble model that integrates the top-performing models in this study while exploring the possibility of collecting data for a more diverse sample, possibly from a broader range of universities.

Looking at the feature importance analysis and considering the significant role of self-rated health and age in predicting all 3 conditions, we recommend prioritizing these features in the training and tuning phases when developing the ensemble models.

Practical Applications Recommendations

Given the promising performance of machine learning models, there is an opportunity to integrate these models, especially regarding key predictive features, into virtual mental health care systems. This could benefit university counseling services to provide early identification and intervention for at-risk students. Additionally, the study’s findings can be used to improve policy-making at educational institutions in terms of raising awareness about the significant predictors of mental health issues and considering data-driven approaches in policy formulation to have more effective mental health strategies.

On a larger scale and based on our analysis of the study results, we recommend integrating the proposed predictive modeling solution with online mindfulness programs or similar scalable solutions to address the widespread mental health problem among university students.

Conclusion

We have outlined and discussed the initial stages of constructing a framework for forecasting depression, anxiety, and stress levels among university students. The MLP-based model exhibited superior performance, demonstrating the highest AUC and satisfactory accuracy. Machine learning models, particularly those applied in virtual care, hold great potential for enhancing mental health interventions. Our upcoming research aims to employ data augmentation techniques to improve results and broaden the scope to include multi-class predictions. Scalable solutions, such as online mindfulness,^70
-72 are also essential to investigate to alleviate the mental health crisis among university students in Lebanon.

Footnotes

Acknowledgements

We thank the Canadian Lebanese Academic Forum for facilitating the team-building effort.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Christo El Morr

Reem Hoteit

References

WHO. Coronavirus disease 2019 (COVID-19) Situation Report – 62. 2020. Accessed October 20, 2021. https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200322-sitrep-62-covid-19.pdf?sfvrsn=755c76cd_2

Brooks

Webster

Smith

, et al. The psychological impact of quarantine and how to reduce it: rapid review of the evidence. Lancet. 2020;395(10227):912-920.

Santomauro

Mantilla Herrera

Shadid

, et al. Global prevalence and burden of depressive and anxiety disorders in 204 countries and territories in 2020 due to the COVID-19 pandemic. Lancet. 2021;398(10312):1700-1712. doi:10.1016/S0140-6736(21)02143-7

Taylor

Landry

Paluszek

Fergus

McKay

Asmundson

GJG

. COVID stress syndrome: concept, structure, and correlates. Depress Anxiety. 2020;37(8):706-714. doi:10.1002/da.23071

Laranjeira

Dixe

Valentim

Charepe

Querido

Mental health and psychological impact during COVID-19 pandemic: an online survey of Portuguese higher education students. Int J Environ Res Public Health. 2022;19(1):337.

Ungar

. The health care payment game is rigged. National Post. 2015. Accessed March 23, 2024. http://news.nationalpost.com/full-comment/thomas-ungar-the-health-care-payment-game-is-rigged

Wang

Wen

Zhang

, et al. Anxiety, depression, and stress prevalence among college students during the COVID-19 pandemic: a systematic review and meta-analysis. J Am College Health. 2023;71(7):2123-2130.

Krishnamoorthy

Nagarajan

Saya

Menon

Prevalence of psychological morbidities among general population, healthcare workers and COVID-19 patients amidst the COVID-19 pandemic: a systematic review and meta-analysis. Psychiatry Res. 2020;293:113382.

Xiong

Lipsitz

Nasri

, et al. Impact of COVID-19 pandemic on mental health in the general population: a systematic review. J Affect Disord. 2020;277:55-64.

10.

Robinson

Sutin

Daly

Jones

. A systematic review and meta-analysis of longitudinal cohort studies comparing mental health before versus during the COVID-19 pandemic in 2020. J Affect Disord. 2022;296:567-576. doi:10.1016/j.jad.2021.09.098

11.

Hawes

Szenczy

Klein

Hajcak

Nelson

BD.

Increases in depression and anxiety symptoms in adolescents and young adults during the COVID-19 pandemic. Psychol Med. 2022;52(14):3222-3230.

12.

Hou

Jiao

Luo

Song

Gender differences of depression and anxiety among social media users during the COVID-19 outbreak in China: a cross-sectional study. BMC Public Health. 2020;20(1):1-11.

13.

Cao

Fang

Hou

, et al. The psychological impact of the COVID-19 epidemic on college students in China. Psychiatry Res. 2020;287:112934.

14.

Chinna

Sundarasen

Khoshaim

, et al. Psychological impact of COVID-19 and lock down measures: an online cross-sectional multicounty study on Asian university students. PLoS One. 2021;16(8):e0253059.

15.

Wang

Hegde

Son

Keller

Smith

Sasangohar

Investigating mental health of US college students during the COVID-19 pandemic: cross-sectional survey study. J Med Internet Res. 2020;22(9):e22817.

16.

Aylie

Mekonen

Mekuria

RM.

The psychological impacts of COVID-19 pandemic among university students in Bench-Sheko Zone, South-west Ethiopia: a community-based cross-sectional study. Psychol Res Behav Manag. 2020;13:813.

17.

Fawaz

Samaha

E-learning: depression, anxiety, and stress symptomatology among Lebanese university students during COVID-19 quarantine. Nurs Forum. 2021;56(1):52-57.

18.

Ghazawy

Ewis

Mahfouz

, et al. Psychological impacts of COVID-19 pandemic on the university students in Egypt. Health Promot Int. 2021;36(4):1116-1125.

19.

Islam

Barna

Raihan

Khan

MNA

Hossain

MT.

Depression and anxiety among university students during the COVID-19 pandemic in Bangladesh: a web-based cross-sectional survey. PLoS One. 2020;15(8):e0238162.

20.

Lopes

Nihei

OK.

Depression, anxiety and stress symptoms in Brazilian university students during the COVID-19 pandemic: predictors and association with life satisfaction, psychological well-being and coping strategies. PLoS One. 2021;16(10):e0258493.

21.

Msheik

Khoury

Talih

Khatib

MFE

Abi Younes

Siddik

Siddik-Sayyid

Factors associated with mental health outcomes: results from a tertiary referral hospital in Lebanon during the COVID-19 pandemic. Libyan J Med. 2021;16(1):1901438. doi:10.1080/19932820.2021.1901438

22.

Islam

Gangat

Mohanan

, et al. Mental health impacts of Lebanon’s economic crisis on healthcare workers amidst COVID-19. Int J Health Plann Manage. 2022;37(2):1160-1165. doi:10.1002/hpm.3324

23.

Abed

Razzak

Hashim

HT.

Mental health effects of COVID-19 within the socioeconomic crisis and after the beirut blast among health care workers and medical students in Lebanon. Prim Care Companion CNS Disord. 2021;23(4):21m02977. doi:10.4088/PCC.21m02977

24.

Fouad

Barkil-Oteo

Diab

JL.

Mental health in Lebanon’s triple-fold crisis: the case of refugees and vulnerable groups in times of COVID-19. Front Public Health. 2020;8:589264. doi:10.3389/fpubh.2020.589264

25.

El Othman

Touma

El Othman

, et al. COVID-19 pandemic and mental health in Lebanon: a cross-sectional study. Int J Psychiatry Clin Pract. 2021;25(2):152-163. doi:10.1080/13651501.2021.1879159

26.

El Chammay

Roberts

. Using COVID-19 responses to help strengthen the mental health system in Lebanon. Psychol Trauma. 2020;12(S1):S281-S283. doi:10.1037/tra0000732

27.

Younes

Safwan

Rahal

Hammoudi

Akiki

Akel

Effect of COVID-19 on mental health among the young population in Lebanon. Encephale. 2022;48(4):371-382. doi:10.1016/j.encep.2021.06.007

28.

Hornstein

Forman-Hoffman

Nazander

Ranta

Hilbert

Predicting therapy outcome in a digital mental health intervention for depression and anxiety: a machine learning approach. Digit Health. 2021;7:20552076211060659. doi:10.1177/20552076211060659

29.

K-S

Cho

S-E

Geem

Kim

Y-K.

Predicting future onset of depression among community dwelling adults in the Republic of Korea using a machine learning algorithm. Neurosci Lett. 2020;721:134804. doi:10.1016/j.neulet.2020.134804

30.

Wang

Zhao

Zhang

Chinese college students have higher anxiety in new semester of online learning during COVID-19: a machine learning approach. Front Psychol. 2020;11:587413. doi:10.3389/fpsyg.2020.587413

31.

Rois

Ray

Rahman

Roy

SK.

Prevalence and predicting factors of perceived stress among Bangladeshi university students using machine learning algorithms. J Health Popul Nutr. 2021;40(1):50. doi:10.1186/s41043-021-00276-5

32.

Walambe

Nayak

Bhardwaj

Kotecha

Employing multimodal machine learning for stress detection. J Healthc Eng. 2021;2021:9356452. doi:10.1155/2021/9356452

33.

Bubulac

Ichim

Popescu

, et al. Detection and management of student stress in the learning process. Paper presented at: 10th Annual International Conference of Education, Research and Innovation; Seville, Spain; 16-18 November, 2017:6537-6544.

34.

Lipovsky

Depression and Anxiety in Health Professions Students: Early Detection and Response Strategies. Washburn University; 2021.

35.

Bou-Hamad

Hoteit

Harajli

Health worries, life satisfaction, and social well-being concerns during the COVID-19 pandemic: insights from Lebanon. PLoS One. 2021;16(7):e0254989.

36.

Saadeh

Sacre

Hallit

Farah

Salameh

Knowledge, attitudes, and practices toward the coronavirus disease 2019 (COVID-19) among nurses in Lebanon. Perspect Psychiatric Care. 2021;57(3):1212-1221. doi:10.1111/ppc.12676

37.

Domiati

Itani

Knowledge, attitude, and practice of the Lebanese community toward COVID-19. Front Med. 2020;7:542. doi:10.3389/fmed.2020.00542

38.

Kroenke

Spitzer

Williams

JB.

The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606-613. doi:10.1046/j.1525-1497.2001.016009606.x

39.

Chin

Chan

Lam

, et al. Detection and management of depression in adult primary care patients in Hong Kong: a cross-sectional survey conducted by a primary care practice-based research network. BMC Fam Pract. 2014;15(1):1-13. doi:10.1186/1471-2296-15-30

40.

Manea

Gilbody

McMillan

Optimal cut-off score for diagnosing depression with the Patient Health Questionnaire (PHQ-9): a meta-analysis. CMAJ. 2012;184(3):E191-E196. doi:10.1503/cmaj.110829

41.

Zhang

Liang

Zhang

Yang

Reliability and validity of the patient health questionnaire-9 in Chinese adolescents. Sichuan Ment Health. 2014;27(4):357-360.

42.

Richardson

McCauley

Grossman

, et al. Evaluation of the Patient Health Questionnaire-9 Item for detecting major depression among adolescents. Pediatrics. 2010;126(6):1117-1123. doi:10.1542/peds.2010-0852

43.

Beck

Epstein

Brown

Steer

RA.

An inventory for measuring clinical anxiety: psychometric properties. J Consult Clin Psychol. 1988;56(6):893. doi:10.1037/0022-006X.56.6.893

44.

Beck

Steer

RA.

Relationship between the Beck anxiety inventory and the Hamilton anxiety rating scale with anxious outpatients. J Anxiety Disord. 1991;5(3):213-223. doi:10.1016/0887-6185(91)90002-B

45.

Steer

Beck

. Beck Anxiety Inventory. Evaluating stress: A book of resources. Scarecrow Education; 1997:23-40.

46.

Beck

Epstein

Brown

Steer

. An inventory for measuring clinical anxiety: psychometric properties. J Consult Clin Psychol. 1988;56(6):893-7. doi:10.1037//0022-006x.56.6.893

47.

Fydrich

Dowdall

Chambless

DL.

Reliability and validity of the Beck Anxiety Inventory. J Anxiety Disord. 1992;6(1):55-61. doi:10.1016/0887-6185(92)90026-4

48.

Cohen

Kamarck

Mermelstein

A global measure of perceived stress. J Health Soc Behavior. 1983;24:385-396.

49.

Almeida

Costa-Santos

Caldas

Dias

Ayres-de-Campos

The impact of migration on women’s mental health in the postpartum period. Rev Saúde Pública. 2016;50:35.

50.

Makhubela

Assessing psychological stress in South African university students: measurement validity of the perceived stress scale (PSS-10) in diverse populations. Curr Psychol. 2022;41(5):2802-2809. doi:10.1007/s12144-020-00784-3

51.

Andreou

Alexopoulos

Lionis

, et al. Perceived stress scale: reliability and validity study in Greece. Int J Environ Res Public Health. 2011;8(8):3287-3298.

52.

Al-Dubai

SAR

Alshagga

RAmpAL

Sulaiman

. Factor structure and reliability of the Malay version of the perceived stress scale among Malaysian medical students. Malaysian J Med Sci. 2012;19(3):43.

53.

El Rassoul

AEA

Razzak

Hashim

HT.

Mental health effects of COVID-19 within the socioeconomic crisis and after the beirut blast among health care workers and medical students in Lebanon. Prim Care Compan CNS Disord. 2021;23(4):35348.

54.

El Morr

Jammal

Ali-Hassan

El-Hallak

. Machine Learning for Practical Decision Making: A Multidisciplinary Perspective with Applications from Healthcare, Engineering and Business Analytics. International Series in Operations Research & Management Science. Springer International Publishing; 2023:250.

55.

Witten

Frank

Hall

Pal

CJ.

Data Mining: Practical Machine Learning Tools and Techniques. Elsevier Science; 2016.

56.

Chen

Guestrin

XGBoost: a scalable tree boosting system. Paper presented at: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Francisco, California, USA; 13-17 August, 2016. doi:10.1145/2939672.2939785

57.

Mohammed

Khan

Bashier

EBM

. Machine Learning: Algorithms and Applications. CRC Press; 2016.

58.

LeCun

Bengio

Hinton

Deep learning. Nature. 2015;521(7553):436-444. doi:10.1038/nature14539

59.

Gardner

Dorling

SR.

Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos Environ. 1998;32(14):2627-2636. doi:10.1016/S1352-2310(97)00447-0

60.

Mandrekar

JN.

Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010;5(9):1315-1316. doi:10.1097/JTO.0b013e3181ec173d

61.

Ware

Yue

Morillo

, et al. Predicting depressive symptoms using smartphone data. Smart Health. 2020;15:100093. doi:10.1016/j.smhl.2019.100093

62.

Fang

Wang

, et al. Depression prevalence in postgraduate students and its association with gait abnormality. IEEE Access. 2019;7:174425-174437. doi:10.1109/ACCESS.2019.2957179

63.

Zulfiker

Kabir

Biswas

Nazneen

Uddin

MS.

An in-depth analysis of machine learning approaches to predict depression. Curr Res Behav Sci. 2021;2:100044. doi:10.1016/j.crbeha.2021.100044

64.

Ahuja

Banga

Mental stress detection in university students using machine learning algorithms. Proc Comput Sci. 2019;152:349-353. doi:10.1016/j.procs.2019.05.007

65.

El Morr

. Virtual communities, machine learning and IoT: opportunities and challenges in mental health research. Int J Extreme Autom Connect Healthc. 2019;1(1):4-11. doi:10.4018/ijeach.2019010102

66.

Lim

. Considering the impact of self-regulation and digital literacy on preserive teachers’ attitudes toward Web 2.0 personal learning environment (PLEs). Paper presented at: Association for the Advancement of Computing in Education (AACE); November 4, 2019; New Orleans, Louisiana, United States; 2019:838-841.

67.

Bou-Hamad

The impact of social media usage and lifestyle habits on academic achievement: insights from a developing country context. Children Youth Serv Rev. 2020;118:105425.

68.

Olson

Wyner

Berk

. Modern neural networks generalize on small data sets. Paper presented at: Proceedings of the 32nd International Conference on Neural Information Processing Systems; December 3-8, 2018; Montréal, QC, Canada; 2018.

69.

Pasini

Artificial neural networks for small dataset analysis. J Thorac Dis. 2015;7(5):953-960.

70.

El Morr

Ritvo

Ahmad

Moineddin

, MVC Team. Effectiveness of an 8-week web-based mindfulness virtual community intervention for university students on symptoms of stress, anxiety, and depression: randomized controlled trial. JMIR Ment Health. 2020;7(7):e18595. doi:10.2196/18595

71.

Ahmad

Wang

El Morr

Online mindfulness interventions: a systematic review. In: El Morr

, ed. Novel Applications of Virtual Communities in Healthcare Settings. IGI Global; 2018:chap 1.

72.

El Morr

Maule

Ashfaq

Ritvo

Ahmad

. A student-centered mental health virtual community needs and features: a focus group study. Stud Health Technol Inform. 2017;234:104-108.