Abstract
Purpose:
COVID-19 impact on the population’s mental health has been reported worldwide. Predicting healthcare workers’ mental health and life stress is needed to proactively plan for future emergencies.
Design:
Statistics Canada has surveyed Canadian healthcare workers and those working in healthcare settings to gauge their perceived mental health and perceived life stress.
Setting:
A cross-sectional survey of healthcare workers in Canada.
Subjects:
A sample of 18,139 healthcare workers respondents.
Analysis:
Eight algorithms, including Logistic Regression, Random Forest (RF), Naive Bayes (NB), K Nearest Neighbours (KNN), Adaptive boost (AdaBoost), Multi-layer perceptron (MLP), XGBoost, and LightBoost. AUC scores, accuracy and precision were measured for all models.
Results:
XGBoost provided the highest performing model AUC score (AUC = 82.07%) for predicting perceived mental health, and Random Forest performed the best for predicting perceived life stress (AUC = 77.74%). Perceived health, age group of participants, and perceived mental health compared to before the pandemic were found to be the most important 3 features to predict perceived mental health and perceived stress. Perceived mental health compared to before the pandemic was the most important predictor for perceived life stress.
Conclusion:
Our models are highly predictive of healthcare workers’ perceived mental health and life stress. Implementing scalable, non-expensive virtual mental health solutions to address mental health challenges in the workplace could mitigate the impact of workplace conditions on healthcare workers’ mental health.
Introduction
COVID-19 had a high impact on the population’s mental health,1,2 including depression, anxiety, and life stress. 3 The impact has been observed in different communities, including healthcare workers and those in healthcare settings. 4 In Canada, healthcare professionals reported higher anxiety symptoms, 5 and quarantine was associated with higher odds of mental distress. 6 High life stress and worsening mental health among healthcare workers 7 could be debilitating conditions.
An inquiry into the mental health and life stress of healthcare workers is needed to understand and proactively plan for mental health interventions (eg, virtual care or tele-mental health) for future pandemics or emergencies. An analysis of healthcare workers and people working in healthcare settings’ experiences of mental health will allow evidence-based programming for tailored mental health interventions.
While machine learning (ML) has been used to predict mental health, to our knowledge, this is the first study that uses ML on a Statistics Canada dataset measuring the impacts of COVID-19 on healthcare workers and people working in healthcare settings 8 to predict perceived life stress and mental health among healthcare workers. While self-perceived mental health may not necessarily reflect actual mental health conditions measured objectively using validated instruments, studies have indicated that individuals who self-report mental health issues tend to utilize mental health services more frequently,9 -15 hence the usefulness of predicting perceived mental health.
Methods
Dataset
The analysis is based on the crowdsource questionnaire (18,139 respondents) conducted from November 24, 2020, to December 13, 2020, by Statistics Canada: Impacts of COVID-19 on Healthcare Workers. 8 The participants for this survey were healthcare workers and those working in healthcare settings from ten different provinces and 3 territories in Canada.
Feature Selection
There were 94 variables, including age, gender, indigenous identity, population group, immigration status, province of residence, perceived health, perceived mental health compared to before the pandemic, and questions related to access to personal protective equipment (PPE) and experience at work.
The outcome (ie, target) variables were perceived mental health and perceived life stress. Both target variables were measured on a Likert-type scale. Perceived mental health was coded 0 for “Poor,” 1 for “Fair,” 2 for “Good,” 3 for “Very Good,” and 4 for “Excellent,” while perceived life stress was coded 1 for “Not at all stressful,” 2 for “Not very stressful,” 3 for “A bit stressful,” 4 for “Quite a bit stressful,” and 5 for “Extremely stressful.” (see Appendix A for the literal questions asked for major features and the outcomes).
Pre-Processing and Algorithm Selection
The dataset missing values were imputed using KNN Imputer, and one-hot-encoding was used for the feature “province or region of residence.” We investigated the following algorithms: Logistic Regression, Random Forest, Naive Bayes, KNN, XG Boost, AdaBoost, Multilayer Perceptron (MLP), and Light Boost. The performance measurements were the receiver operating characteristic (ROC) Area Under the Curve (AUC) score, Accuracy, Precision, and F1 score. We performed hyperparameter tuning using a Randomized Search with a cross-validation of 5 for all the above algorithms. The complete list of hyperparameters used during hyperparameter tuning is provided in Table 1.
Algorithms and Their Tuned Hyperparameters.
To enhance the predictability, the target variables were transformed by merging classes. For perceived mental health, classes 0 and 1 (ie, poor to fair), as well as classes 3 and 4 (ie, very good to excellent), were merged, while class 2 (ie, sound) was kept as is. The OneVsRest (ovr) strategy was used to calculate the models’ performance in the multi-class scenario. For perceived life stress, classes 1, 2, and 3 were merged to represent “low stress,” and classes 4 and 5 were merged to represent “high stress” (see Appendix B for a link to our Python code).
Implementation Procedure
Implementation is done using Python (3.10.12) and the Scikit-learn library (1.2.2). The CSV file is read into a data frame. Imputation is performed using KNNImputer. The obtained dataset has been split into training and test data using Stratified Shuffle Split, 80% for training and 20% for testing.
Results
Population Characteristics
In the sample, 28.49% of people were less than 35 of age, 27.47% were 35 to 44 years of age, 24.65% were 45 to 54 years old, and 19.02% were 55 and older; 0.37% did not answer.
The sample had 8.11% of people with poor perceived mental health, 24.81% of people with fair perceived mental health, 32.38% of people with good perceived mental health, 25.26% of people with very good perceived mental health, and 9.44% of people with excellent perceived mental health. As for the perceived life stress, 12.23% of people had an extremely stressful perceived life, 6.8% people had not very stressful life, 41.66% people had quite a bit stressful life, 38.38% people had a bit stressful life, and 0.92% people had not at all stressful life. The distribution across professions is shown in Table 2.
Distribution of Respondents Across Professions.
Perceived Mental Health
We have used the AUC score as the parameter to compare the models as both the true positive and false negative rates are essential for health situations (ie, mental health). For perceived mental health, the best AUC score was obtained with the XGBoost model (82.07%), closely followed by LightBoost (81.77%) and Random Forest (80.80%). The KNN model had the lowest AUC score (66.36%). The complete models’ performance measurements are presented in Table 3.
Models’ Performance Measurements for Perceived Mental Health After Merging Classes.
We further performed feature importance analysis for perceived mental health using XGBoost. “Perceived health” (score 100%), “age group of participants” (score 97.21%), and “perceived mental health compared to before the pandemic” (score 85.48%) were the 3 most important features for the prediction of perceived mental health. All other features scored below 63%.
Perceived Life Stress
For perceived life stress, the best AUC scores were obtained from the Random Forest model (77.74%), closely followed by the MLP model (77.56%). Results from all the models after merging classes are given in Table 4.
Models’ Performance Measurements for Perceived Life Stress After Merging Classes.
Using the Random Forest model, we performed a feature importance analysis for perceived life stress. “Perceived mental health compared to before the pandemic” (score 100%) was the most important feature for predicting perceived life stress. All other features scored below 10%.
Predictions Using the Most Important Features Only
Using the features with feature importance >70%, we ran the XGBoost and Random Forest models to predict the perceived mental health and perceived life stress, respectively. For predicting perceived mental health using XGBoost, we obtained an AUC score of 81.96% and an accuracy of 64.23% using only the 3 most important features cited above. For predicting perceived life stress using the Random Forest model, we obtained a AUC score of 73.24% and an accuracy of 68.61% using only the most important feature cited above.
Discussion
Models of Choice
This study aimed to develop predictive models for perceived mental health and perceived life stress. A test with an AUC value between 80 and 90% is considered excellent, while a test with more than 90% AUC is considered outstanding. 16 While predicting the perceived mental health, XGBoost achieved the highest AUC score (82.07%) and close to the third-highest accuracy score (64.30%), which is close to the highest score (65.28%) and second-highest score (65.15%). Light Boost and Random Forest achieved similar accuracy precision and F1 score, but XGBoost outperformed both in terms of AUC score. XG Boost could be a model choice for predicting perceived mental health.
Whereas, while predicting perceived life stress, Random Forest could be the best model as it achieved the highest AUC (77.74%), accuracy (70.61%), precision (70.53%), and F1 scores (70.53%).
Model Implementability
It is interesting to note that considering only the top 3 features as predictors for perceived mental health, we obtained an AUC score of 81.96% for the XGBoost model. This is very close to the AUC score (82.07%) we obtained when considering all 94 features. For predicting perceived life stress using the topmost feature, we obtained an AUC score (73.24%) for the Random Forest model. This AUC score is close to the AUC score (77.74%) we obtained after considering all 94 features. Hence, the updated models pave a new path to predict perceived mental health and perceived life stress, with a few features without jeopardizing the models’ performance (ie, AUC score, accuracy). This allows using features with feature importance >70% to obtain predictions as good as those made after considering all 94 features. When fewer features are used, the prediction processing time is reduced, making a model implementable in real-life systems.
Policy Implications
It can be observed that perceived health, age group of participants, and perceived mental health compared to before the pandemic are the 3 most important determining factors for perceived mental health. This is in line with prior research; it has been found that age is one of the most significant predictors of mental health. 17 On the other hand, perceived mental health compared to before the pandemic is our study’s most important predictor for perceived life stress; we could not find literature that allows us to compare our findings.
Perceived mental health compared to before the pandemic was a common feature for predicting perceived mental health as well as perceived stress; this finding has direct workplace policy implications. A continuous assessment of the perceived mental health of healthcare workers becomes advisable since it is a major predictor of stress and mental health well-being. This is confirmed by a previous study that found that perception of mental health compared to before physical distancing, as well as negatively perceived life stress and perceived mental health, were all high predictors of anxiety in the general population in Canada. 18
Monitoring is 1 step only; addressing the perceived stress and negatively perceived mental health is the second important step. An important implication is implementing programs to support mental health and well-being for healthcare workers. Implementing such programs is paramount, and while face-to-face programs can be expensive, eHealth applications addressing mental health are not as expensive. Virtual care can be a great resource. 19 Particularly, online mindfulness has been proven to be effective in addressing depression, anxiety, and stress in various populations,20-24 including for workplace interventions, 25 and including for healthcare workers. 26 These online mindfulness applications do not have to isolate healthcare workers, as they can embed an optional virtual community component 27 where participants can feel a sense of community. Policymakers can deploy scalable virtual mindfulness tools to address the mental health effects,18,28 especially affecting the young, 29 while considering equity implications. 30
Study Limitations
While the dataset is large (18,139 respondents) and based on a national survey, 1 limitation of this study is that the sample is not representative of Canadian healthcare workers and those working in healthcare settings. The healthcare workers respondents were physicians, nurses, personal support workers or care aides, emergency medical personnel, allied health professionals, laboratory workers, pharmacists, dental professionals, and others (students, support services, etc.). Still, the sample does not represent their percentages in the workplace. This might affect the performance of our model.
Like in most machine learning models, our model’s performance was not confirmed by another “external” data set. This step is important if such a model is to be implemented in a workplace to predict healthcare workers’ mental health and well-being.
Study Contributions
This study showcases the potential of machine learning in health research, provides critical insights for targeted intervention strategies, and suggests practical, policy-oriented applications to support the mental well-being of healthcare professionals.
Evidence-Based Insights for Tailored Mental Health Interventions
This study relies on an innovative use of machine learning; it is the first to apply ML techniques to the Statistics Canada dataset concerning COVID-19’s impact on healthcare workers for predicting mental health outcomes. This approach can set a precedent for future research in the domain, demonstrating the potential of ML in public health research. By analyzing healthcare workers' mental health and life stress experiences during the pandemic, the study provides evidence-based insights that could inform the development of tailored mental health interventions.
Policy Implications and Intervention Strategies
The findings are crucial for planning and implementing effective support systems for healthcare professionals, particularly in anticipation of future pandemics or emergencies. Also, the findings have direct implications for workplace policies and mental health support programs for healthcare workers. By identifying the most significant predictors of mental health and stress, policymakers and healthcare administrators can design more effective programs, such as virtual care and online mindfulness interventions, to support healthcare workers’ mental health.
Highlighting the Importance of Continuous Mental Health Assessment
The study underscores the importance of continuous assessment of healthcare workers’ perceived mental health as a major predictor of stress and overall well-being. This can inform ongoing support and intervention strategies, emphasizing the need for proactive rather than reactive mental health support services. The novelty resides in the fact of being able to predict perceived mental health status based on a few respondents’ characteristics (eg, perceived health, age group, and changes in perceived mental health since the pandemic) without the use of an instrument.
Advocating for the Use of eHealth Tools
Finally, our study advocates for the implementation of eHealth solutions (eg, virtual mindfulness programs) as cost-effective and scalable options for addressing mental health issues among healthcare workers. This recommendation aligns with a broader trend toward digital health solutions and could significantly impact how mental health support is provided in the healthcare sector.
Conclusion
Based on a national survey, we have explained and discussed a model to predict perceived mental health and perceived life stress for healthcare workers in Canada. XGBoost and Random Forest models are the best models for predicting perceived mental health and perceived life stress, respectively; the model’s performance remained stable by using only the features with importance above 70%. Perceived mental health before the pandemic was the most important predictor for perceived life stress. It is important to implement scalable virtual mental health solutions to address workplace challenges for healthcare workers.
Footnotes
Appendix A
Appendix B
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was supported by a Mitacs Globalink Research Internship award.
