Sage Journals: Discover world-class research

Abstract

Objectives: This study examined the associations between pesticide exposures, perceived farm stressors, COVID-19-related stressors, and mental health disorders among Thai farmers. Methods: A total of 270 participants were interviewed to assess mental health disorders. Information was also collected on household environments, agricultural activities, and perceived farm- and COVID-19-related stressors. After data preprocessing, 211 samples remained for analysis. Multiple linear regression models were employed as a baseline, and their performance was compared with ensemble tree-based models, which can capture more complex, nonlinear patterns. The Boruta feature selection technique and SHAP scores were used to explain associations between mental health and the independent variables. Results: Lower levels of mental health disorder symptoms were associated with higher levels of personal protective equipment (PPE) use. Certain perceived farm stressors and COVID-19-related stressors were also correlated with mental health outcomes. Conclusions: The findings indicate that greater PPE use and good agricultural practices are associated with reduced symptoms of mental health disorders. This pilot study highlights the potential of machine learning models to explore complex public health issues involving multiple, interrelated factors.

Keywords

COVID-19 farmers machine learning mental health pesticide exposures

Introduction

Stress can significantly affect human health and performance, often resulting in long-term mental health disorders across all demographic groups.¹ Farmers’ mental health has become a growing concern due to their exposure to unpredictable challenges and hazardous working conditions. Identifying the underlying causes of mental health disorders is essential for effectively promoting farmers’ well-being. While research on farmer mental health stressors has largely focused on developed countries, few studies have examined these issues in developing nations such as Thailand. Only a limited number of investigations have explored associations between mental health and related risk factors. Yazd et al. identified key contributors to mental health risks in farmers, including pesticide exposure, financial strain, climate variability, drought, and poor physical health, findings that align with those of Hagen BNM et al.^1,2 Most studies to date have relied on descriptive analyses to examine mental health outcomes in farming populations. Although pesticide exposure has been associated with mental and sleep disorders in farmers,^3,4,5,6 evidence linking agrochemical exposure to psychological distress remains inconclusive.⁷ Fewer studies have examined correlations between perceived farm stressors and mental health, particularly among Thai farmers. This pilot study employed a comprehensive farm stressor inventory questionnaire to better understand these stressors within this population.

To analyze variable relationships in studies on farmers’ stress, life satisfaction, and well-being, ordinary least squares regression and multiple logistic regression have been commonly applied.^8–13 Many recent investigations have considered the mental health impacts of the COVID-19 pandemic.^14–18 Several studies have also reported associations between pesticide use and mental health risks among Thai farmers,^19–23 while additional work has explored the pandemic’s effects.^24,25 A recent analysis combining descriptive analytics and multiple logistic regression further investigated the relationship between pesticide exposure and mental health disorders in Thai farmers.²⁶

Given the complexity and multidimensionality of farm-related stressors, machine learning (ML) approaches offer notable advantages over traditional statistical models such as multiple linear regression, which primarily capture linear relationships. ML models are capable of identifying nonlinear patterns, modeling complex variable interactions, and accommodating heterogeneous data types. In agriculture, the interrelated nature of stressors presents challenges that require modeling tools able to detect these complex relationships without imposing predefined functional forms or distributional assumptions. When paired with advanced interpretability techniques, ML models can provide deeper insights into the factors most strongly influencing mental health outcomes among farming populations. In particular, ensemble tree-based models, combined with interpretability methods such as SHAP values, support both accurate prediction and transparent feature attribution.

This study comprises three analyses related to mental health among Thai farmers. The first focuses on the association between pesticide exposure and symptoms of stress, depression, and anxiety. The second examines the relationship between mental health and perceived stressors, including both occupational (farm-related) and non-occupational (financial and social) factors, while controlling for demographic variables and COVID-19-related stressors. The third specifically investigates the impact of COVID-19-related stressors on mental health outcomes.

Our work is distinguished by the incorporation of specific stressor variables, model development strategies, feature selection techniques, and SHAP-based model interpretation to estimate mental health outcomes. We conducted a detailed quantitative analysis to examine the relationships among various stressors, COVID-19-related variables, and mental health. Machine learning models were applied to investigate the complex interactions underlying mental health disorders among Thai farmers. In particular, ensemble tree-based models, including random forest, Gradient Boosting, and XGBoost, were utilized due to their superior predictive performance over baseline multiple linear regression. Several feature selection methods were employed, including p-value analysis, models’ feature importance score, Boruta, and Boruta-SHAP,^27,28 while SHAP values facilitated interpretation of the results. This approach not only enhances the accuracy of mental health outcome predictions but also allows for transparent ranking and visualization of the relative importance of various stressors, thereby supporting evidence-based and practical interventions.

Methods

Study area and data collection

This study was based on Noomnual et al.²⁶, approved by the Ethical Review Committee for Human Research, Faculty of Public Health, Mahidol University (COA No. MUPH2021-087). It was conducted in Amphoe Payuhakiri, Nakhon Sawan province, Thailand. Details regarding the inclusion and exclusion criteria have been described elsewhere.²⁶ The sample size was calculated to estimate a population proportion in a finite population2²⁹ and adjusted for a 10% non-response rate, resulting in a total sample size of 270. Participants were selected using a purposive sampling technique. Thai farmers and their family members aged 18 years or older, who had engaged in agricultural activities as a primary or secondary occupation for at least one year, were eligible. Individuals with pre-existing mental disorders or health conditions were excluded.

All 270 participants completed the survey and were included in the raw dataset prior to data preparation. Each participant was interviewed by one of three trained interviewers using a structured, four-part questionnaire. The survey collected information on (1) demographics, household environment, and agricultural activities; (2) perceived farm stressors; (3) self-reported symptoms of stress, depression, and anxiety; and (4) COVID-19-related stress, including concerns about virus transmission, fear, and frequent information checking. The questionnaires are described in detail elsewhere.²⁶ Briefly, parts 1 and 3 were adopted from surveys commonly used in previous studies among Thai populations. Parts 2 and 4 were translated from validated instruments used in prior research. The translated versions were reviewed and compared with the originals by three experts to ensure content validity and conceptual consistency.

The questionnaire addressed three research questions: (Q1) the effect of occupational pesticide exposure on mental health; (Q2) the association between perceived farm-related, financial, and social stress indicators and mental health; and (Q3) the relationship between COVID-19-related stressors and mental health. Variables corresponding to Case I, Case II, and Case III, which address Q1, Q2, and Q3 respectively, are presented in Table 1. Detailed descriptions of stress indicators and COVID-19 stressors are provided in the supplemental materials (Suppl. 1 and Suppl. 2).

Table 1.

A list of all variables used in this study to address Q1, Q2, and Q3.

Q1	Q2	Q3	Factors	Description	Variable type
✓	✓	✓	GENDER	Gender	Categorical (0 = male, 1 = female)
✓	✓	✓	AGE	Age	Continuous
✓	✓	✓	EDU	Highest education level	Categorical (0 = primary school and below, 1 = secondary school and above)
✓	✓	✓	MAR	Marital status	Categorical (0 = single/widow/divorced/separated, 1 = married)
✓	✓	✓	HH_INC	Household income	Categorical (0 = less than ฿10000, 1 = more than ฿10000)
✓	✓	✓	SAT	Household income satisfactory	Categorical (0 = not enough, 1 = enough)
✓	✓	✓	DEBT	Indebted	Categorical (0 = no, 1 = yes)
✓	✓	✓	MED	Medical/health insurance	Categorical (0 = no, 1 = yes)
✓	✓	✓	SMK	Smoking habit	Categorical (0 = no/quitted, 1 = yes)
✓	✓	✓	ALC	Alcoholic drinking habit	Categorical (0 = no/quitted, 1 = yes)
✓	✓	✓	PRI	Farmer as primary job	Categorical (0 = no, 1 = yes)
✓	✓	✓	SEC	Any other second job	Categorical (0 = no, 1 = yes)
✓	✓	✓	SPRAY	Engaging in pesticide spraying/application within 1 year	Categorical (0 = no, 1 = yes)
✓	✓		HH_DIST	Live within 1 km from farm areas	Categorical (0 = no, 1 = yes)
✓	✓		HH_CHEM	Using chemical products in household	Categorical (0 = no, 1 = yes)
✓	✓		CST-ST	Sum of COVID-19 stress scores	Continuous
✓	✓		CST-CHE	Sum of COVID-19 check scores	Continuous
✓	✓		CST-PHO	Sum of COVID-19 phobia scores	Continuous
✓			PRAC	Sum of good work practices scores	Continuous
✓			PPE	Sum of PPE usage scores	Continuous
✓			PEST_HIS1	Past pesticide usage	Categorical (0 = no, 1 = yes)
✓			PEST_HIS2	Number of years of past pesticide usage	Categorical (0 = less than or equal to 10 years, 1 = more than 10 years)
✓			PEST_HIS3	Frequency of past pesticide usage	Categorical (0 = less than 4 times per month, 1 = 4 times per month or more)
✓			PEST_HIS_PPE	Sum of past PPE usage scores	Continuous
	✓		FS	Please see detail in Supplemental Material 1	Categorical (0 = no, 1 = yes)
		✓	CST	Please see detail in Supplemental Material 2	Categorical (0 = no, 1 = mild, 2 = moderate, 3 = severe, 4 = extremely severe)

Demographic variables were included in all three cases. For Cases I and II, household location with respect to farm areas and chemical use within the household were considered relevant contextual factors and were therefore included, along with summary COVID-19 scores. Case I additionally incorporated specific pesticide-related stressors, while Case II focused on farm-related, financial, and social stressors. In contrast, Case III was designed to specifically address COVID-19 stressors, and thus these variables were prioritized while pesticide- and farm-related stressors were not included. These three cases were constructed to approximate an ablation analysis to focus on a specific group of stressors. Particularly, this design allowed us to examine the contribution of each feature group to predictive performance.

After data collection, a comprehensive analysis was performed in three stages: data preparation, model development, and identification of key contributing factors. Each stage was designed to ensure data quality, construct robust predictive models, and extract meaningful insights. In the first stage, raw data were cleaned, transformed, and preprocessed to ensure suitability for downstream analysis. The second stage focused on developing predictive models using both baseline and advanced tree-based algorithms. Finally, the third stage involved interpreting model outputs and identifying important factors contributing to the outcomes of interest. The complete analytical workflow is shown in Figure 1 where associated pseudocode is provided in Supplemental Material 8 to facilitate reproducibility.

Figure 1.

Process workflow.

Data preparation

To prepare the raw data, columns with more than 75% missing values were removed, and the remaining gaps were imputed using standard techniques. After excluding incomplete and inconsistent entries, the dataset consisted of 211 samples. Continuous variables were scaled to a range of 0 to 1. The objective of this study was to investigate mental health issues among farmers, specifically focusing on stress, anxiety, and depression, which were used as response variables. Mental health scores were computed as proportions based on self-reported indicators. Features relevant to the three research questions were organized into separate datasets, referred to as Case I, Case II, and Case III. Model development and key factor identification were conducted independently for each case.

Model development and evaluation metrics

This study applied several machine learning models, multiple linear regression (MLR) and tree-based models, including random forest (RF), Gradient boosting, and XGBoost, to estimate stress, anxiety, and depression levels, treating them as continuous response variables. These tree-based models are ensemble approaches that combine the predictions of individual trees, where each regression tree recursively partitions the input space into regions and assigns a constant predicted value within each region. These tree-based models were selected for their robustness against multicollinearity among predictors, their ability to capture potential nonlinear relationships between stressors and mental health outcomes, and their reduced tendency to overfit due to ensemble averaging. Among the tree-based ensemble models, we selected the best-performing algorithm as the representative for this group to be compared against the baseline MLR model.

To improve model stability and generalizability, a 5-fold cross-validation (CV) approach was used. The model was trained on four folds and validated on the fifth, rotating this process five times. Although the sample size was relatively small (n = 211), the tree-based model’s internal ensemble technique and feature subsampling mechanisms can yield stable estimates when combined with cross-validation. However, theoretical benefits may not always translate into practical performance improvements. To evaluate this, we conducted direct comparisons between tree-based models and the baseline MLR using the same features and cross-validation settings. This approach allowed us to determine whether the increased complexity of tree-based models resulted in a meaningful gain in predictive accuracy for the dataset.

Model performance was evaluated using the mean square error (MSE), with lower values indicating better accuracy. The full dataset was split into 80% training and 20% hold-out test sets. Cross-validation was applied only within the training set to tune hyperparameters and assess model stability across different parameter combinations. The goal was to minimize MSE while maintaining a balance between model complexity and generalization. The final model was then evaluated on the hold-out test set.

Key factors identification

For the MLR model, the significance of each variable was assessed using standard statistical t-tests. Variables with p-values less than 0.05 were retained, establishing a baseline for feature importance using classical statistical criteria. These results were compared with feature importance rankings derived from the tree-based models, Boruta, and SHAP. Feature importance in the tree-based models was calculated based on the cumulative reduction in prediction error attributed to each variable across all sub decision trees. Variables with higher importance scores contributed more substantially to accurate predictions. The Boruta algorithm provided a more robust feature selection approach by extending the tree-based model. It generated shadow features by randomly permuting the values of the original variables. Across multiple iterations, each original feature’s importance was statistically compared against its shadow counterpart. Boruta classified features as ‘important’ if they consistently outperformed their shadow versions, ‘potentially important’ if they had mixed performance, or ‘unimportant’ if they consistently underperformed. In addition, we explored SHAP (SHapley Additive exPlanations), a method based on the Shapley value from cooperative game theory. SHAP values quantify the marginal contribution of each variable to the model’s prediction relative to the average prediction. This approach allows interpretation of how much each variable contributes, either positively or negatively, to a prediction. Rather than relying on raw model-derived feature importance, the BorutaSHAP method used SHAP values as its importance metric. By combining SHAP’s interpretability with the robustness of Boruta, we identified key variables that most significantly influenced farmers’ mental health outcomes.

Statistical analysis

To evaluate and compare the predictive performance of the baseline MLR and relatively more complicated tree-based models, an empirical analysis was conducted using a paired t-test on cross-validation results. The mean squared error (MSE) for each model was computed across multiple random train-test splits within a repeated cross-validation framework. This procedure allowed us to examine the difference in performance between the two models across consistent data partitions. For each split, we calculated the difference in MSE values (MSE_TREE—MSE_LR).

The hypotheses for the statistical test were defined as follows:

H₀: The mean difference in MSE is less than or equal to zero, indicating that the tree-based model performs equally or better than the MLR model.

H₁: The mean difference in MSE is greater than zero, indicating that the tree-based model performs worse than the MLR model.

This hypothesis testing was conducted separately for each target variable, including ST, DASS-ST, 9Q, DASS-DEP, and DASS-ANX. For each case, we calculated the t-statistic and corresponding p-value to assess the significance of the observed performance difference. A p-value less than 0.05 was considered statistically significant and indicated sufficient evidence to reject the null hypothesis, suggesting that the tree-based model performed significantly worse than the MLR model for that specific outcome. This testing framework provided insight into both average performance and model stability across multiple data partitions.

Results

Relationships among mental health disorder scores were initially assessed by calculating Pearson correlation coefficients between ST5-stress and DASS21-stress, and between 9Q-depression and DASS21-depression, to examine consistency across different assessment instruments. A correlation coefficient of 0.62 was observed between the stress measures, and 0.59 between the depression measures.

Both the baseline MLR and the proposed tree-based models incorporated various independent variables, including occupational pesticide exposures, occupational stressors, COVID-19-related stressors, and other confounding factors such as demographic variables. Model performance across all cases was evaluated by comparing average validation MSE across all CV folds and test MSE on the held-out test set, as presented in Table 2. The similarity between validation and test MSE scores suggested that the models were not overfitted and demonstrated acceptable generalizability. Among the tree-based models, we compared RF, GradientBoosting and XGBoost prior to selecting the best-performing algorithm based on an average CV score. Overall, RF demonstrated superior performance across the majority of experimental cases. However, for specific outcomes in Case I, namely DASS-ST and DASS-ANX, GradientBoosting and XGBoost, respectively, achieved higher predictive accuracy. Therefore, the best-performing algorithms were selected as the representative models. Table 2 presents the MSE results of the best-performing tree-based algorithms while the full results for RF, Gradient Boosting, and XGBoost are reported in Supplemental Material 3. Although the selected tree-based models showed modest improvements in prediction accuracy over MLR, these improvements were relatively limited and may be context dependent. Therefore, results should be interpreted with caution, especially when considering generalizability to other populations or datasets.

Table 2.

MSE using multiple linear regression (MLR) and the selected tree-based model.

	Response variables	MLR: MSE scores			Tree-based model: MSE scores
	Response variables	Train	Validation	Test	Train	Validation	Test
Case I	ST_ratio	0.0079	0.0135	0.0109	0.0111	0.0070	0.0089
	DASS-ST _ratio	0.0049	0.0087	0.0089	0.0000	0.0034	0.0085
	9Q_ratio	0.0042	0.0068	0.0033	0.0053	0.0025	0.0048
	DASS-DEP _ratio	0.0032	0.0066	0.0081	0.0050	0.0019	0.0036
	DASS-ANX_ratio	0.0024	0.0045	0.0043	0.0002	0.0013	0.0038
Case II	ST_ratio	0.0057	0.0140	0.0160	0.0108	0.0074	0.0080
	DASS-ST _ratio	0.0035	0.0095	0.0099	0.0066	0.0037	0.0054
	9Q_ratio	0.0029	0.0070	0.0045	0.0053	0.0030	0.0041
	DASS-DEP _ratio	0.0021	0.0074	0.0096	0.0049	0.0022	0.0036
	DASS-ANX_ratio	0.0017	0.0054	0.0065	0.0037	0.0016	0.0029
Case III	ST_ratio	0.0073	0.0165	0.0135	0.0111	0.0076	0.0086
	DASS-ST _ratio	0.0038	0.0084	0.0057	0.0066	0.0035	0.0050
	9Q_ratio	0.0037	0.0073	0.0036	0.0051	0.0033	0.0043
	DASS-DEP _ratio	0.0026	0.0077	0.0074	0.0051	0.0023	0.0034
	DASS-ANX_ratio	0.0018	0.0046	0.0046	0.0038	0.0016	0.0028

To assess performance differences between the models, a cross-validation paired t-test was conducted, comparing MSE values from the selected tree-based model and MLR across five random data splits with multiple repetitions. P-values were calculated to test the null hypothesis that both models performed equivalently. The results indicated that the tree-based model significantly outperformed MLR for all mental health outcomes across all three cases, supporting the decision to emphasize the tree-based model in subsequent analyses.

Key contributing factors influencing symptom levels of mental health disorders were further analyzed. In Case I, pesticide usage, adherence to good agricultural practices, PPE usage, and past pesticide exposure were evaluated. Case II examined perceived stressors related to farming, finances, and social factors. Case III focused on the relationship between COVID-19 stressors and mental health levels. Statistical p-values from MLR and feature importance rankings from the Boruta and Boruta SHAP algorithms are presented in Table 3 and Supplemental Materials 4 and 5 for Cases I, II, and III, respectively. Only selected features with corresponding MLR coefficients are reported. Statistically significant variables were identified based on p-values below 0.05, with coefficient signs indicating the direction of association with the response variable. Higher feature importance values from the tree-based model denote greater influence. In the Boruta and Boruta SHAP columns, an absence of a tick mark indicates an unimportant feature, one tick mark indicates ‘potentially important,’ and two tick marks denote ‘important’ features.

Table 3.

A comparison of selected features from statistical p-values and Boruta algorithms along with feature importance values from the model for Case I.

Feature (case I)	ST				DASS - ST				9Q				DASS - DEP				DASS - ANX
Feature (case I)	Coefficients	Feature Imp	Boruta	Boruta SHAP	Coefficients	Feature Imp	Boruta	Boruta SHAP	Coefficients	Feature Imp	Boruta	Boruta SHAP	Coefficients	Feature Imp	Boruta	Boruta SHAP	Coefficients	Feature Imp	Boruta	BorutaSHAP
GENDER	0.0537	0.014				0.0106				0.0121				0.0102				0.0212
AGE	0.002	0.0504	✓✓	✓✓		0.0807		✓✓		0.0387				0.0314				0.0515	✓✓
EDU		0.0161		✓✓		0.0086				0.0145				0.0098				0.0565
MAR		0.0109				0.0018				0.0064				0.005				0.0257
HH_INC		0.03		✓✓		0.0073		✓✓		0.0377		✓✓		0.0473		✓✓		0.0173
SAT		0.0078				0.0045			−0.0297	0.0054				0.0119				0.0433
DEBT		0.0062				0.012				0.0081				0.0099				0.0338
MED		0.0				0.0005				0.0				0.0				0.0104
SMK		0.0006				0.0027				0.0021				0.0016				0.0067
ALC		0.0062				0.0146				0.0129				0.0121				0.0231
PRI		0.0				0.0				0.0				0.0				0.0
SEC		0.0366		✓✓		0.0101			0.0398	0.0391		✓✓		0.0217				0.0703	✓✓	✓✓
SPRAY		0.0111				0.0271				0.007				0.0116				0.0275
HH_DIST	0.0497	0.0736		✓✓		0.0319		✓✓		0.1033	✓	✓✓		0.0573		✓✓		0.0158
HH_CHEM		0.0018				0.0718				0.002				0.0045				0.0525	✓✓
CST-ST	0.0315	0.1834	✓✓	✓✓	0.0296	0.4083	✓✓	✓✓	0.0093	0.1259	✓✓	✓✓	0.0222	0.2071	✓✓	✓✓	0.0225	0.191	✓✓	✓✓
CST-CHE		0.0474		✓		0.0558				0.031				0.0309			−0.004	0.0354
CST-PHO		0.1026	✓✓	✓✓		0.0848	✓✓			0.0793	✓	✓✓		0.0709		✓✓		0.0228
PRAC		0.1033	✓✓	✓✓		0.064				0.1688	✓✓	✓✓		0.1502	✓✓	✓✓		0.0439		✓✓
PPE		0.1444	✓✓	✓✓		0.0648	✓✓		−0.1862	0.1714	✓✓	✓✓		0.1523	✓✓	✓✓		0.1107	✓✓	✓✓
PEST_HIS1		0.0081				0.0018				0.0071				0.0077				0.0235
PEST_HIS2		0.009				0.0147				0.0115				0.0197				0.0175
PEST_HIS3		0.0021				0.0149				0.002			0.0701	0.0167			0.0488	0.0569
PEST_HIS_PPE		0.1342	✓✓	✓✓		0.0066				0.1134	✓✓	✓✓		0.1102	✓✓	✓✓		0.0428

In Case I, COVID-19-related stressors were positively associated with multiple mental health outcomes, while PPE usage was negatively associated with 9Q-depression scores. Specifically, a one-unit increase in PPE usage was associated with a 0.19-unit decrease in depression scores, assuming all other variables remained constant. Boruta and Boruta SHAP consistently identified similar important features for 9Q and DASS-DEP, whereas MLR coefficients showed inconsistencies. This trend was generally observed in the comparison between ST and DASS-ST with slight discrepancies. In Case II, COVID-19 stressors continued to show positive correlations with most mental health symptom levels, which aligned with SHAP values. However, only a few farm-related stressors (FS) showed consistent correlations between SHAP and MLR results. Boruta SHAP identified more significant features for all target variables than MLR, with larger discrepancies noted between ST versus DASS-ST and 9Q versus DASS-DEP compared to Case I. For FS-related features in MLR, five variables were identified as important for ST, but only two for DASS-ST. In Case III, numerous COVID-19-related variables were identified as significant contributors in the SHAP analysis, despite showing limited importance in MLR, regardless of the mental health response variable considered.

SHAP values of selected features were further analyzed using the Boruta SHAP approach with the tree-based model. Absolute SHAP values indicated the overall influence of each feature on the model’s predictions. The direction of feature contributions was also evaluated (Figure 2; Supplemental Materials 6 and 7). Each dot represents an individual data point, with pink and blue indicating high and low feature values, respectively. The sign of the SHAP value reflects the direction of impact on the target variable. Negative SHAP values with pink dots concentrated on the left suggest that high feature values are associated with lower predicted outcomes, and vice versa. A lack of clear separation between pink and blue indicates an ambiguous or non-directional effect of the feature on the prediction.

Figure 2.

SHAP values for top features for Case I.

In Case I, higher values for current and past use of PPE were generally associated with lower levels of mental health disorder symptoms for most outcome variables (Figure 2). Clear separation between pink and blue dots was observed especially for the PPE feature, with pink dots predominantly appearing on the left side of the axis. Some variations were observed for the PEST_HIS5_PPE feature for DASS-ST and DASS-ANX target variables, for which no clear conclusion could be made. COVID-19-related stressors, a key confounding factor, exhibited a strong positive correlation with stress, depression, and anxiety levels. The variable HH_DIST, representing residence within 1 km of farming areas, also showed a positive association with mental health symptoms in most cases, suggesting increased stress and depression among farmers living near their work environment. In contrast, good agricultural work practices (PRAC) were negatively correlated with mental health symptom levels.

In Case II, the influence of perceived stressors related to farming (FRM), finance (FIN), and social factors (SOC) on mental health outcomes was examined using the FS1 to FS25 variables (Supplemental Material 6). For stress outcomes (ST and DASS-ST), FS13, representing lack of support from government policy, emerged as a key farm-related factor. Among social indicators, FS10, reflecting no leisure time with family members, showed a positive association with stress symptoms. For depression-related outcomes (9Q and DASS-DEP), significant farm-related indicators included FS13 and FS11 (concern about cultivated areas). FS1, denoting inconvenience in commuting, also emerged as a meaningful social factor contributing to elevated depression levels. It is noteworthy that COVID-19-related stressors and HH_DIST, as major confounding factors, showed strong positive correlations with stress, depression, and anxiety levels.

Focused analysis of COVID-19-related variables revealed minimal separation between pink and blue dots across most features, regardless of the mental health outcome (Supplemental Material 7). This indicated a lack of clear directional influence for many COVID-19 stressors. However, CST9, which concerns fear of individuals traveling from abroad being infected, consistently showed importance for stress, depression, and anxiety scores. Additionally, CST12 and CST14, which reflect lack of sleep and intrusive thoughts about the disease, respectively, were positively associated with higher levels of mental health disorder symptoms.

Discussion

This study employed the tree-based regressor to examine associations between mental health disorder symptoms and relevant independent variables, with the MLR model used as a baseline for comparison. Identifying important features was performed using the Boruta algorithm, and the resulting features were interpreted using SHAP values. These were compared against traditional linear regression coefficients to evaluate both the magnitude and direction of each variable’s effect on mental health outcomes. COVID-19-related stress consistently emerged as a key factor, supporting the reliability of the findings, while less influential variables showed inconsistencies across models.

In Case I, COVID-19-related stress was identified as the most significant confounding factor in both MLR and Boruta analyses. The direction of effects indicated by MLR coefficients in Table 3 aligned partially with the SHAP results shown in Figure 2. Both present and past PPE use were generally found to be important across most mental health outcomes according to SHAP values, though they were less prominent in MLR results. In Case II, the impact of perceived farm stressors on mental health symptom levels was assessed. Farm-related and social indicators consistently ranked among the top features in all models, whereas financial stressors were significant only for depression and anxiety symptoms, particularly in FS24. SHAP plots showed positive correlations for these stressors, illustrated by the dominance of pink dots on the right side of the axis. COVID-19-related stress was again found to be positively associated with all mental health outcomes. Household income (HH_INC) influenced almost all response variables except ST The inconsistencies observed between MLR and SHAP results for ST versus DASS-ST and 9Q versus DASS-DEP may be attributed to the moderate correlations (approximately 0.6) between these questionnaire scores, highlighting the importance of consistent questionnaire design and terminology in mental health assessments. Case III provided additional insights into COVID-19-related factors, while demographic variables, such as age, appeared less influential. Concerns about individuals traveling abroad and being exposed to COVID-19 (CST9) emerged as one of the most important features. Both positive and negative coefficients were observed (Supplemental Material 5), consistent with the lack of clear separation between pink and blue dots in the SHAP visualizations (Supplemental Material 7).

The results indicated that good agricultural practices and PPE use were associated with reduced mental health disorder symptoms, as highlighted by SHAP values. These findings are consistent with the previous study by Kaewboonchoo et al.,2²² reinforcing the importance of implementing safe pesticide practices among Thai farmers. To the best of current knowledge, no prior study has applied a comprehensive farm stressor inventory among Thai farmers. A positive association was observed between heightened perceptions of farm stress and adverse mental health outcomes. The stressor inventory enabled the identification of detailed risk factors and supported the development of targeted intervention programs and policies aimed at improving mental health and well-being in agricultural populations. This pilot study demonstrated the value of machine learning models in exploring occupational and non-occupational factors related to mental health disorders among Thai farmers. Tree-based models were employed to capture nonlinear relationships, in contrast to earlier studies that relied primarily on linear regression models. The integration of Boruta and SHAP facilitated the identification and interpretation of influential factors by combining the interpretability of SHAP values with Boruta’s robust feature selection capability.

The study highlighted the potential of machine learning in public health systems by contributing to evidence-based policy development, monitoring, forecasting, and structured data analysis. It demonstrated how machine learning can be used to identify risk factors and health behavior patterns, thereby informing more effective interventions and mental health promotion strategies.^30–32 The present study illustrated how machine learning models can uncover complex relationships in survey-based public health research, using mental health among Thai farmers as a case study. Previous research applied machine learning to mental health data, primarily for diagnosis, treatment, and support.^33,34 However, relatively few studies have used machine learning for public health surveillance, particularly to examine complex risk factor interactions in specific populations such as farmers. Survey data, as used in this study, proved useful for detecting mental health conditions. This analysis showed that machine learning can be applied to broad and complex variable sets, such as farm stressors, offering valuable insights for future public health interventions. Prior research has applied machine learning models to questionnaire data to predict mental health among adolescents and university students.^35,36 Tree-based algorithms were often used, along with SHAP value analysis for interpreting feature importance. Other studies also examined the influence of COVID-19 on mental health, identifying key risk factors that inform intervention strategies.^37,38

Several limitations were noted. First, the self-reported mental health disorders were not validated with external sources, such as biological measurements or personal journals. In addition, pesticide exposure levels were not quantitatively measured. These may introduce recall bias, reporting errors, and social desirability bias—especially in relation to mental health and chemical use. Our study captured participants’ self-reported adverse effects within the past year in order to minimize recall bias. Future studies should incorporate quantification of pesticide exposures. Second, the study was conducted in a single agricultural region in Thailand, which may reduce its applicability to other geographical, cultural, or occupational contexts. Although the sample size was calculated with a 10% allowance for non-response based on our previous research, only 211 complete records were available for model development after data cleaning. While this provides valuable continuity of research, the purposive sampling and relatively small post-cleaning sample size may limit statistical power and external validity. In addition, the small sample size, when combined with machine learning techniques, may increase the risk of overfitting, thereby limiting the models’ generalizability. Future work should therefore expand to larger, multi-regional, and longitudinal datasets and incorporate external validation to enhance model robustness and confirm the stability of key predictors. Although the current study employed a case-based design to address sample size limitations, future work should implement a full ablation analysis across all variables to provide a more rigorous evaluation of predictor importance. Despite these limitations, this pilot study demonstrated the utility of farm stressor assessments and machine learning in understanding mental health risks among Thai farmers. To strengthen the policy translation pathways in future work, we aim to incorporate advanced XAI-powered approaches to better support the decision-making process.³⁹ These efforts would support comprehensive mental health interventions and contribute to improving well-being among agricultural populations.

Conclusion

Mental health disorders may be influenced by a range of factors, including demographic characteristics, cultural context, socioeconomic status, lifestyle, access to healthcare, and individual perceptions of health risks. This study applied a farm stressor inventory to capture both occupational and non-occupational factors associated with mental health disorder symptoms among Thai farmers using machine learning models for analysis. The findings indicated that lower levels of mental health disorder symptoms were associated with higher levels of both current and past PPE usage, as well as adherence to good agricultural work practices. Associations were also identified between mental health disorder symptoms and indicators related to agricultural and social stressors. Additionally, COVID-19-related factors were found to be significant confounders. This pilot study demonstrated the utility of machine learning approaches in examining complex public health issues involving multiple, interrelated variables.

Supplemental Material

Supplemental Material - Application of machine learning to identify key factors influencing agricultural workers’ mental health: A case study of Thai farmers

Supplemental Material for Application of machine learning to identify key factors influencing agricultural workers’ mental health: A case study of Thai farmers by Papis Wongchaisuwat, Veerasit Kaewbundit, Saisattha Noomnual in Health Informatics Journal

Footnotes

Acknowledgements

Data collected for this research was supported by the Fogarty International Center of the National Institutes of Health under Award Number U2RTW010088. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

ORCID iDs

Veerasit Kaewbundit

Saisattha Noomnual

Ethical considerations

This study was approved by the Ethical Review Committee for Human Research Faculty of Public Health, Mahidol University (COA No. MUPH2021-087).

Consent to participate

All participants were written and verbal informed consent prior to participate in the study.

Author contributions

Conceptualization: Papis W, Saisattha N. Data curation: Veerasit K. Formal analysis: Papis W, Veerasit K, Saisattha N. Funding acquisition: Papis W, Saisattha N. Methodology: Papis W, Veerasit K. Project administration: Saisattha N. Visualization: Veerasit K. Writing - original draft: Papis W, Saisattha N. Writing - review & editing: Papis W, Saisattha N.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Fogarty International Center of the National Institutes of Health under Award Number U2RTW010088 for the data collection phase. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The data analysis for this project was supported by Kasetsart University Research and Development Institute under grant number FF(KU) 51.67. However, any opinions, findings, and conclusions or recommendations in this document are those of the authors and do not necessarily reflect the views of the sponsor.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.*

Supplemental Material

Supplemental material for this article is available online.

References

Daghagh Yazd

Wheeler

Zuo

. Key risk factors affecting farmers’ mental health: a systematic review. Int J Environ Res Publ Health 2019; 16(23): 4849.

Hagen

BNM

Albright

Sargeant

, et al. Research trends in farmers’ mental health: a scoping review of mental health outcomes and interventions among farming populations worldwide. PLoS One 2019; 14(12): e0225661.

Buralli

Ribeiro

Iglesias

, et al. Occupational exposure to pesticides and health symptoms among family farmers in Brazil. Rev Saude Publica 2020; 54: 133.

Harrison

Mackenzie Ross

. Anxiety and depression following cumulative low-level exposure to organophosphate pesticides. Environ Res 2016; 151: 528–536.

Kim

Lee

. Depressive symptoms and severity of acute occupational pesticide poisoning among male farmers. Occup Environ Med 2013; 70: 303–309.

Hao

Tian

, et al. Relationship between cumulative exposure to pesticides and sleep disorders among greenhouse vegetable farmers. BMC Public Health 2019; 19: 373.

Khan

Kennedy

Cotton

, et al. A pest to mental health? Exploring the link between exposure to agrichemicals in farmers and mental health. Int J Environ Res Publ Health 2019; 16(8): 1327.

Zhang

Huang

, et al. The application of artificial neural network in studying landless farmer’s mental health problems. Zhōnghuá liúxíngbìng zázhì 2008; 29: 1038–1041.

Scarth

Zwerling

Burmeister

, et al. Agricultural health and safety. In: Depression and risk factors among Iowa farmers. CRC Press, 1997, p. 10.

10.

Edwards

Gray

Hunter

. The impact of drought on mental health in rural and regional Australia. Soc Indic Res 2015; 121: 177–194. https://www.jstor.org/stable/24721393

11.

Bjornestad

Brown

Weidauer

. The relationship between social support and depressive symptoms in Midwestern farmers. Journal of Rural Mental Health 2019; 43: 109–117.

12.

Heo

Lee

Park

. Financial-related psychological factors affect life satisfaction of farmers. J Rural Stud 2020; 80: 185–194.

13.

Batterham

Brown

Calear

, et al. The FarmWell study: examining relationships between farm environment, financial status and the mental health and wellbeing of farmers. Psychiatry Res Commun 2022; 2: 100036.

14.

Cevher

Altunkaynak

Gürü

. Impacts of covid-19 on agricultural production branches: an investigation of anxiety disorders among farmers. Sustainability 2021; 13(4): 5186.

15.

Liu

Huang

, et al. Relationship between risk perception, social support, and mental health among general Chinese population during the covid-19 pandemic. Risk Manag Healthc Policy 2021; 14: 1843–1853.

16.

Rose

Shortland

Hall

, et al. The impact of COVID-19 on farmers’ mental health: a case study of the UK. J Agromed 2023; 28: 346–364.

17.

Thompson

Hagen

BNM

Lumley

, et al. Mental health and substance use of farmers in Canada during COVID-19. Int J Environ Res Publ Health 2022; 19(20): 13566.

18.

Yazdanpanah

Zobeidi

Tajeri Moghadam

, et al. Cognitive theory of stress and farmers’ responses to the COVID 19 shock; a model to assess coping behaviors with stress among farmers in southern Iran. Int J Disaster Risk Reduct 2021; 64: 102513.

19.

Hanklang

Kaewboonchoo

Morioka

, et al. Gender differences in depression symptoms among rice farmers in Thailand. Asia Pac J Publ Health 2016; 28: 83–93.

20.

Suwan-Ampai

Hanklang

Kaewboonchoo

, et al. Development and validation of the knowledge, self-efficacy, outcome expectation and behavior on pesticide exposure prevention for rice farmers. Int J Nurs Clin Pract 2017; 4: 263.

21.

Sapbamrer

. Pesticide use, poisoning, and knowledge and unsafe occupational practices in Thailand. New Solut 2018; 28(2): 283–302.

22.

Kaewboonchoo

Hanklang

Boonyamalik

, et al. Effect of depression prevention programs among rice farmers in Thailand. Environ Occup Health Pract 2020; 2(1): 20.

23.

Ong-Artborirak

Boonchieng

Juntarawijit

, et al. Potential effects on mental health status associated with occupational exposure to pesticides among Thai farmers. Int J Environ Res Publ Health 2022; 19(15): 9654.

24.

Sapbamrer

Chittrakul

Sirikul

, et al. Impact of COVID-19 pandemic on daily lives, agricultural working lives, and mental health of farmers in Northern Thailand. Sustainability 2022; 14(3): 1189.

25.

Sapbamrer

Sittitoon

La-up

, et al. Changes in agricultural context and mental health of farmers in different regions of Thailand during the fifth wave of the COVID-19 pandemic. BMC Public Health 2022; 22: 2050.

26.

Noomnual

Konthonbut

Kongtip

, et al. Mental health disorders among Thai farmers: occupational and non-occupational stressors. Hum Ecol Risk Assess 2024; 30(1): 180–200.

27.

Gramegna

Giudici

. Shapley feature selection. FinTech 2022; 1(1): 72–80.

28.

Kursa

Rudnicki

. Feature selection with the Boruta package. J Stat Software 2010; 36(11): 1–13.

29.

Daniel

Cross

. Biostatistics: a foundation for analysis in the health sciences. 10th ed. Wiley, 2013.

30.

Rodrigues

Madeiro

Marques

JAL

. Enhancing health and public health through machine learning: decision support for smarter choices. Bioengineering 2023; 10(7): 792.

31.

Ramezani

Takian

Bakhtiari

, et al. The application of artificial intelligence in health policy: a scoping review. BMC Health Serv Res 2023; 23: 1416.

32.

Martinez-Millana

Saez-Saez

Tornero-Costa

, et al. Artificial intelligence and its impact on the domains of universal health coverage, health emergencies and health promotion: an overview of systematic reviews. Int J Med Inf 2022; 166: 104855.

33.

Rahman

Omar

Mohd Noah

, et al. Application of machine learning methods in mental health detection: a systematic review. IEEE Access 2020; 8: 183952–183964.

34.

Shatte

ABR

Hutchinson

Teague

. Machine learning in mental health: a scoping review of methods and applications. Psychol Med 2019; 49: 1426–1448.

35.

Tate

McCabe

Larsson

, et al. Predicting mental health problems in adolescence using machine learning techniques. PLoS One 2020; 15(4): e0230389.

36.

Baba

Bunji

. Prediction of mental health problem using annual student health survey: machine learning approach. JMIR Ment Health 2013; 10: e42420.

37.

Herbert

El Bolock

Abdennadher

. How do you feel during the COVID-19 pandemic? A survey using psychological and linguistic self-report measures, and machine learning to investigate mental health, subjective experience, personality, and behaviour during the COVID-19 pandemic among university students. BMC Psychol 2021; 9: 90.

38.

Chen

. Investigating the mental health of university students during the COVID-19 pandemic in a UK university: a machine learning approach using feature permutation importance. Brain Inform 2023; 10: 27.

39.

Baydili

İT

Tasci

. Predicting employee attrition: XAI-powered models for managerial decision-making. Systems 2015; 13(1): 583.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.72 MB