Abstract
Objective
Heart disease is a leading cause of death and disability among middle-aged and elderly populations. Depression is a common comorbidity that impairs prognosis and quality of life. This study aimed to develop a machine learning (ML)-based depression risk prediction model based on China Health and Retirement Longitudinal Study (CHARLS) data.
Methods
A total of 947 middle-aged and elderly heart disease patients from CHARLS 2015 were included after applying missing data criteria. Missing values were filled using random forest (RF), and data were split 7:3 into training and validation cohorts. Variables selection in the training cohort using univariate analysis, Lasso regression, recursive feature elimination (RFE), and feature importance evaluation using RF and decision tree (DT). Variables appearing in at least three of these five methods were selected. Eleven ML models were constructed and evaluated by area under the curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, F1 score, calibration curve and decision curve analysis. Five-fold cross-validation enhanced stability and SHapley Additive exPlanation (SHAP) values interpreted feature importance.
Results
Fifty-eight variables were extracted. After multi-step variable selection within the training cohort, nine variables (address, grip-max, arthritis rheumatism, Hope, sleep time, pain, Retire, ADL, IADL) were initially identified. Among 11 ML models, the logistic regression (LR) algorithm demonstrated the best overall performance with an AUC of 0.792 in the validation cohort. A 4-variable LR model (pain, address, sleep time, and grip-max) was optimized, achieving a comparable AUC of 0.788. SHAP analysis confirmed pain as the most critical predictor (69.0% of depressed patients reported pain versus 26.9% of non-depressed patients). Rural residence (86.5% vs. 66.7%), shorter sleep time (median 5.25(4.00, 7.00) vs. 6.00(5.00, 8.00) hours), and lower grip-max (24.50(20.00, 30.00) vs. 27.00(22.50, 33.40) increased depression risk. A user-friendly web-based calculator was developed for clinical applications.
Conclusions
The simplified LR model exhibits robust predictive performance and clinical applicability for assessing high depression risk in middle-aged and elderly patients with heart disease.
1. Background
Heart disease remains a leading cause of death and disability among middle-aged and elderly populations globally, posing a significant public health challenge. 1 Depression, as the most common psychological comorbidity among patients with heart disease, has a significantly higher prevalence compared to the general population. Studies indicate that approximately 30% to 40% of middle-aged and elderly patients with heart disease experience varying degrees of depression. 2 This comorbidity not only affects patients’ quality of life but also leads to poor prognosis and decreased treatment effectiveness for heart disease. 3 Recent studies have shown that depression is significantly related to the risk of heart disease and may exacerbate the condition in patients with heart disease, leading to adverse cardiovascular events.4,5 For instance, a previous study indicates that patients with severe depression have an approximately 14.7% increased risk of developing heart disease. 6 Furthermore, depression is closely associated with reduced survival rates in patients with heart disease, and patients’ mental health status significantly impacts their physical health and quality of life. 7
Despite various interventions targeting heart disease and depression, several challenges persist in clinical practice, including low identification rates, poor intervention efficacy, and low patient compliance. A previous study shows that the prevalence of depression among hospitalized heart failure patients reaches 46.8%, with moderate to severe depression accounting for 11.6%. However, clinical screening rates are below 30.0%. 8 These challenges make it difficult for clinicians to formulate effective treatment plans for patients with both heart disease and depression. Therefore, early identification and intervention of depression in patients with heart disease are particularly important, especially in the Chinese middle-aged and elderly population. 9
Previous studies on heart disease and depression have several drawbacks, particularly in factors affecting depressive symptoms in Chinese middle-aged and elderly patients, focusing on single variables, and relying on subjective clinical assessments. However, currently machine learning (ML) algorithms demonstrate significant advantages, with powerful feature extraction and pattern recognition capabilities. They can integrate multidimensional data such as demographics, clinical symptoms, and laboratory indicators, thereby significantly improving the accuracy and stability of predictive models. These algorithms have been successfully applied to predict prognosis and complications of various diseases. 10 Thus, employing ML to construct predictive models for depression risk in heart disease patients will provide new methods for this field.11,12 CHARLS, as a nationally representative longitudinal survey, covers 28 provinces and includes rich demographic, health-related status, lifestyle, and laboratory testing data, providing a high-quality data source suitable for constructing predictive models intended for the Chinese middle-aged and elderly population. 13 Health disparities in depression care are well-recognized, with rural populations and socioeconomically disadvantaged groups facing higher burdens of undiagnosed mental health conditions. These inequalities are particularly relevant in the context of heart disease patients, where comorbid depression often remains unrecognized.14,15
This study aims to utilize the CHARLS 2015 baseline data to integrate multiple potential predictive variables, construct and compare various ML models, screen core predictive factors for depression risk in middle-aged and elderly heart disease patients, and develop simplified and efficient predictive models as well as user-friendly tools to improve the mental health levels of heart disease patients.
2. Methods
2.1. Data source
The data for this study are derived from CHARLS 2015 national baseline data, which was approved by the Ethics Review Committee of Peking University (Ethics Approval No. IRB00001052-11015). CHARLS is a nationally representative longitudinal survey covering 150 districts and counties in 28 provinces of China. The initial sample included 21,112 respondents aged more than 45 years old. The survey encompasses a wide range of content: basic information, economic conditions, family structure, disease history, general health, lifestyle, physical function impairment, cognition, insurance information, retirement security, blood pressure, lung function, grip strength, complete blood count, and biochemical indicators.
After obtaining official authorization access and use the CHARLS 2015 dataset for this research, data was downloaded, and research subjects were screened according to the following criteria: (1) self-reported diagnosis of heart disease; (2) completion of the simplified version of the Center for Epidemiologic Studies Depression Scale (CES-D); (3) no missing data for smoking, drinking, and white blood cell count. A total of 947 heart disease patients were included, of whom 422 (44.6%) were determined to have depressive symptoms (CES-D≥10 points).
To assess whether the sample size was sufficient for developing a reliable prediction model, the events per variable (EPV) criterion was applied. An EPV greater than 10 was considered adequate to prevent overfitting and ensure stable model estimates. 16
2.2. Information collection and definition
2.2.1. Research variables
This study extracted 58 potential predictive variables from the CHARLS database. These variables cover demographic characteristics, health status, lifestyle, and blood test indicators. All predictive variables are defined as follows:
Five demographic variables include gender, age, education level, marital status, and address. Six physical measurement variables include height, weight, body mass index (BMI), obesity, waist circumference, and grip strength. Twenty-two health status, health behavior, and cognitive ability variables include smoking, drinking, activity of daily living (ADL), instrumental activities of daily living (IADL), self-health status, hypertension, dyslipidemia, diabetes mellitus, cancer, liver diseases, chronic lung diseases, stroke, kidney diseases, digestive diseases, arthritis rheumatism, asthma, exercise, sleep time, social interaction, pain, disability, and hope for future life. Two healthcare and health insurance variables include insurance and hospital visits within the past month. One retirement status variable: retirement. Twenty-two blood test-related variables include white blood cell count(WBC); hemoglobin; hematocrit (HCT); mean red cell volume (MCV); platelet count (PLT); triglyceride (TG); creatinine; urea; high-density lipoprotein(HDL); low-density lipoprotein (LDL); total cholesterol; glucose; uric acid (UA); cystatin C (CYSC); C-reactive protein (CRP); glycohemoglobin (HbA1c); triglyceride glucose index; urea to creatinine ratio; uric acid to creatinine ratio; TG to HDL ratio; LDL to HDL ratio; and glucose to glycohemoglobin ratio. All blood test indicators (such as triglycerides, blood glucose, glycosylated hemoglobin, etc.) are based on fasting venous blood tests, with units standardized to international units or clinically common units.
All categorical variables in this study are binary variables. Gender is categorized as male or female. Education level is categorized as “high school or above” and “below high school”. Marital status is classified as married or single. Address is categorized as urban or rural. Self-health status is categorized as good or poor. Based on information collected within a month, hospitalization status is categorized as present or absent. Smoking is defined as smoking once or more per week. Drinking is defined as drinking once or more per week. Engaging in vigorous or moderate activity once or more per week is considered regular physical activity. ADL and IADL are assessed using the ADL scale, which includes 6 items for basic activities of daily living (BADL) and 6 items for IADL. Item scoring is based on the Functional Independence Measure (FIM), where each response is scored as 7.0, 6.0, 4.0, or 1.5 points. These scores are used to calculate the ADL score. 17 A total score below 72 indicates complete or conditional dependence, while a total score of 72 or above indicates complete or conditional independence. Other categorical variables are classified as present or absent according to self-reports.
2.2.2. Depression assessment
Depression was designated as the outcome variable. CHARLS used the CES-D scale, which includes 10 items to assess depressive symptoms. The Chinese version of this scale demonstrated good reliability and validity among the middle-aged and elderly population in China. 18 The total score ranges from 0 to 30, and this study adopted the internationally accepted cutoff, defining a total score≥10 as significant depressive symptoms, which served as the outcome variable.
2.3. Variable selection and model construction
After the predefined inclusion and exclusion criteria were applied, the missing data were imputed using the random forest (RF) method, and then the imputed data were randomly split into a 7:3 ratio, which served as the training cohort (n=663) and validation cohort (n=284). All variable selection procedures were performed exclusively within the training cohort to prevent data leakage.. Within the training cohort, univariate analysis was conducted on the 58 variables. For continuous variables, either the t-test or Mann-Whitney U test was used; for categorical variables, the chi-square test was applied. P<0.05 was considered statistically significant. Second, Lasso regression analysis was performed on the initially screened variables using the R glmnet package; the lambda value minimizing cross-validation error was selected through 10-fold cross-validation to further reduce variables. Third, recursive feature elimination (RFE) was used to assess variable importance, retaining the top 10 ranked variables. Then, the data were analyzed for feature importance using RF variable importance and decision tree (DT) variable importance. Ultimately, variables that appeared in at least three of these five selection methods were included in the preliminary predictive variable cohort. Among highly correlated variables (Pearson correlation coefficient > 0.65), we only selected one based on its known biological association with depression and clinical accessibility.
In the model construction phase, eleven ML models including Naive Bayes (NB), K-nearest neighbors (KNN), logistic regression (LR), random forest (RF), decision tree (DC), artificial neural network (ANN), support vector machine (SVM), gradient boosting (GB), light gradient boosting machine (LGBM), extreme gradient boosting (XGB), and adaptive boosting (AdaBoost) were constructed in training and validation cohort .
2.4. Model evaluation, selection, and simplification
Model performance was evaluated in both the training and validation cohorts. The main evaluation metrics included accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1 score, and AUC. Five-fold cross-validation ROC curves were also generated to compare the stability of different algorithms. Calibration curves were used to assess the consistency between predicted values and observed values. Decision curve analysis (DCA) was utilized to evaluate the clinical utility of the model at different clinical risk thresholds.
After determining the optimal algorithm, we explored model simplification to enhance clinical usability. Based on variable importance ranked by Shapley Additive Explanations (SHAP) values, we employed a backward elimination method, sequentially removing the least important variables and observing changes in AUC on the validation cohort. This process was repeated until only one variable remained. The simplest model was determined by analyzing changes in model diagnostic performance after variable removal.
2.5. Model interpretation and application
Calibration curves and DCA were used to evaluate the model’s calibration and clinical utility. SHAP values were applied to ascertain the variable importance of each predictor and to visualize their association with depression risk of middle-aged and elderly heart disease patients. Variables were ranked according to the magnitude of their SHAP values, with a higher ranking indicating a more significant impact on the outcome. A positive SHAP value indicates a stronger positive effect of the variable on the outcome; a negative SHAP value indicates a stronger negative effect. To facilitate clinical translation and large-scale screening, we developed a web-based risk calculator based on the final simplified LR model. The calculator was implemented using the Streamlit framework (Python 3.9) and deployed on a cloud platform.
2.6. Statistical analysis
Continuous variables were first assessed for normality, using the Shapiro-Wilk test. Data that met normal distribution and homogeneity of variance were expressed as mean ± standard deviation, with independent samples t-test used for comparisons between groups. Non-normal or heteroscedastic data were expressed as median (interquartile range), with the Mann-Whitney U test used for comparisons between groups. Categorical variables were expressed as frequency (percentage), with the chi-square test used for comparisons between groups. ML models were crafted using MyLab+ i-Research. All statistical analyses were conducted using R 4.3.3 software, with a two-sided P-value < 0.05 considered statistically significant.
3. Results
3.1. Demographic and clinical characteristics
Through inclusion and exclusion criteria, a total of 947 heart disease patients were included in the analysis (Figure 1), of whom 422 were in the depression group (44.6%) and 525 in the non-depression group (55.4%). As shown in Table 1, 28 variables had significant differences between the depression and non-depression groups. Flow chart of screening research objects. Comparison of data between non-depression group and depression group in middle-aged and elderly patients with heart disease. BMI, body mass index; ADL, activity of daily living; IADL, instrumental activities of daily living; WBC, white blood cell; MCV, mean corpuscular volume; HDL, high density lipoprotein; LDL, low density lipoprotein; CRP, C-reaction protein; HbA1c, glycated hemoglobin; TyG Index, Triglyceride-Glucose Index. Values in bold indicate P<0.05.
3.2. Variable selection
Following univariate analysis, Lasso regression, RFE screening, feature importance selected using RF and DT performed exclusively within the training cohort, nine variables that appeared in at least three of the five selection methods were ultimately selected for the model construction phase (Figure 2), including address (rural/urban), grip (kg), arthritis rheumatism (yes/no), ADL(yes/no), IADL(yes/no), sleep time (hours), pain (yes/no), Hope (yes/no) and Retire (yes/no). Variable selection for the construction of machine learning models: (a) The venn diagram of variable screening processes using univariate analysis, Lasso, REF, RF and DT in the training cohort. (b) Pearson correlation analysis.
3.3. Model construction and performance
Performance of the eleven machine learning algorithms in the training and validation cohorts for depression patients.
PPV, positive predictive value; NPV, negative predictive value; AUC, the area under the curve; CI, confidence interval; NB, Naive-Bayes; KNN, K-nearest neighbor; LR, logistic regression; RF, random forest; DC, decision tree; ANN, artificial neural networks; SVM, support vector machine; GB, gradient boosting; LGBM, light gradient boosting machine; XGB, extreme gradient boosting; AdaBoost, adaptive boosting.
Further evaluation using validation cohort ROC curves and 5-fold cross-validation ROC curves revealed that LR achieved a validation AUC of 0.792 (95% CI: 0.741–0.845), with cross-validation curves showing stable clustering and no significant fluctuations (Figure 3(a) and (b)). Although SVM achieved a slightly higher validation AUC of 0.796, its reliance on kernel functions and hyperparameters resulted in poor interpretability, making it less suitable for clinical application. ANN showed relatively low validation AUC with large cross-validation fluctuations. LGBM and AdaBoost were both inferior to LR in terms of AUC and stability. Consequently, SVM, ANN, LGBM, and AdaBoost were excluded, leaving LR as the final optimal algorithm. Model construction and evaluation. (a) ROC Curves of 11 algorithms in the Validation cohort; (b) ROC curves of 11 algorithms plotted by 5-fold cross-validation; (c) Calibration curve of LR algorithm in the validation cohort; (d) Decision curves of LR in the validation cohort.
Finally, LR was comprehensively evaluated and demonstrated optimal performance in calibration curves for predicted-observed agreement and decision curve analysis (DCA) for net clinical benefit, fully meeting the requirements for a clinical prediction model (Figure 3(c) and (d)).
3.4. Model simplification
Performance comparison of LR algorithm with different feature numbers.
PPV, positive predictive value; NPV, negative predictive value; AUC, the area under the curve;CI, confidence interval.
The final simplified model contained 4 predictors. With a total of 422 depression events in the full sample (n=947) and 296 events in the training cohort (n=663), the resulting EPV was 105.5 and 74.0, respectively, both substantially exceeding the recommended minimum of 10. This confirms that the sample size was adequate for model development and internal validation.
3.5. Model interpretability and application
The calibration curve and DCA for the final 4-variable LR model (pain, address, sleep time, and grip-max) are presented in Figure 4(a) and (b), respectively. The calibration curve demonstrated good agreement between predicted depression risk and observed outcomes, indicating that the model is well-calibrated. The DCA showed that the 4-variable LR model provides positive net clinical benefit across a wide range of reasonable threshold probabilities, supporting its clinical utility for depression risk screening in middle-aged and elderly heart disease patients. The evaluation, interpretability and application of the 4-variable LR model. (a) The calibration curve for the final 4-variable LR model; (b) The decision curves for the final 4-variable LR model; (c) SHAP plot of the 4-variable LR model; (d) Screenshot of the web-based calculator.
SHAP values were used to interpret the final 4-variable LR model (Figure 4(c)). The results showed that pain is the most important factor in predicting depression risk, with the SHAP summary plot indicating that pain presence significantly increases this risk. Next, residing in rural areas and having shorter sleep time were both associated with a higher risk of depression. Lower grip strength (grip-max) also emerged as a risk factor.
Based on these findings, we developed a user-friendly web-based calculator for clinical application. The calculator is accessible at https://bgmyndhqg6cnnlssqkr9op.streamlit.app/ (Figure 4(d)).
4. Discussion
The comorbidity of cardiovascular disease and depressive disorders poses a significant challenge in global public health and clinical practice. 19 This study, based on nationally representative data from CHARLS, integrated multidimensional variables for the first time and constructed 11 ML models for predicting the risk of depression in middle-aged and elderly patients with heart disease. By performing variable selection strictly within the training cohort to prevent data leakage and applying a more inclusive missing data criterion, we obtained a larger and more representative sample (n=947). Through variable selection and model simplification, we ultimately obtained a high-performing and clinically practical 4-variable LR model. The findings not only revealed the key factors influencing the risk of depression in this population but also provided a web-based calculator tool for early screening of clinical depression risk.
SHAP value analysis showed that pain symptoms are the most important factor predicting depression risk in middle-aged and elderly heart disease patients. Notably, 69.0% of patients in the depression group reported frequent pain, compared to only 26.9% in the non-depression group (Table 1). This result aligns with existing research showing a close bidirectional association between chronic pain and depression. Pain triggers psychological stress through persistent physiological discomfort, leading to depressive symptoms such as low mood and loss of interest, while depression can amplify pain perception thresholds and reduce pain coping abilities, worsening the pain experience. 20 Therefore, for heart disease patients, chronic pain such as chest pain and joint pain is common; clinicians should pay attention to the psychological state of patients while managing their pain symptoms, and timely screening for depression risk is necessary. Living in rural areas is another key risk factor, with the proportion of rural residents in the depression group (86.5%) significantly higher than that in the non-depression group (66.7%) (Table 1). This may be related to factors such as the relative scarcity of medical resources in rural areas, low household economic status, and weak social support networks. 21 This study also found that patients in the depressed group had shorter sleep time, which aligns with the bidirectional impact mechanism between sleep disorders and depression: insufficient sleep can disrupt neurotransmitter balance like serotonin and dopamine, triggering depressive moods, while depressive symptoms can lead to difficulties in falling asleep and decreased sleep quality through anxiety and rumination. 22 Lower grip strength (grip-max) was identified as a risk factor for depression in this study. Grip strength is a well-established indicator of overall muscle function and physical frailty. 23 Previous research has shown that reduced grip strength is associated with increased risk of depression in older adults, possibly due to decreased physical activity, reduced social participation, and underlying inflammatory processes. 24 This finding suggests that assessing grip strength, a simple and inexpensive measurement, could aid depression risk screening among heart disease patients. Consequently, clinical attention should be paid to the psychological needs of patients with comorbidities.
This study is based on the CHARLS database, which provides authoritative data from a nationally representative sample using a stratified, multi-stage proportional sampling method. This provides a good external validity basis for generalizing the research conclusions to a broader population of heart disease patients in China. Furthermore, compared to traditional predictive models, the ML model constructed in this study has significant advantages. The research strictly follows predictive model construction standards, by using univariate analysis, Lasso regression, recursive feature elimination, RF and DT for variable selection, which ensures the reliability of the selected variables. Based on performance in independent validation cohorts, this study selected the more robust LR algorithm and supplemented it with SHAP analysis for in-depth interpretation. We found that the LR model achieved an AUC of 0.792 (95%CI: 0.741-0.845), while the simplified 4-variable model was 0.788(95%CI: 0.733-0.840). This simplification of the model provides a feasible solution for clinical practice. The development of the web-based risk calculator further enhances the model’s practicality. It allows clinicians and researchers to quickly obtain depression risk assessment results by inputting the 4 core clinical indicators of patients, providing an efficient tool for large-scale screening and epidemiological investigations. Targeted intervention programs could be developed based on this predictive model, such as cognitive behavioral therapy and psychological counseling for high-risk patients. The effectiveness of these interventions could then be assessed to provide more comprehensive support for the mental health management of middle-aged and elderly heart disease patients.
This study has several limitations. First, although we increased the sample size by relaxing missing data criteria, the sample is still drawn from a single wave of CHARLS, and the lack of external validation limits the generalizability and clinical applicability of the model. Future multi-center external validation is needed to improve generalizability. Moreover, some potentially important variables, such as levels of social support, economic income, and coping strategies, were not included in the analysis, which may affect the model’s predictive accuracy. Future research could further expand the sample size; incorporate longitudinal follow-up data to validate the model’s long-term predictive performance; integrate multi-omics data to improve predictive accuracy; and enhance the model’s generalization ability through multi-center external validation.
Additionally, targeted intervention programs could be developed based on this predictive model, such as cognitive behavioral therapy and psychological counseling for high-risk patients. The effectiveness of these interventions could then be assessed to provide more comprehensive support for the mental health management of middle-aged and elderly heart disease patients.
For clinical implementation, the web-based calculator requires all four predictors (pain, address, sleep time, grip-max) as inputs. If any predictor value is missing or of poor quality, the model should not be used until valid data are obtained; imputation is not recommended for clinical decision-making. The calculator is designed for healthcare professionals (physicians or nurses) and requires no statistical expertise, as users simply enter the four values and receive an automatic risk prediction. The tool is not intended for patient self-use without clinical supervision.
5. Conclusion
This study constructed and simplified a LR predictive model for depression risk in middle-aged and elderly heart disease patients based on CHARLS data. Core predictive variables include pain, address, sleep time and grip-max, with the model demonstrating good predictive efficacy and clinical practicality. The promotion and application of this model and the accompanying web calculator will aid in the early identification of high-risk depression patients in clinical settings, facilitate timely intervention, help break the vicious cycle of comorbidity between heart disease and depression, and improve overall patient prognosis.
Footnotes
Acknowledgements
The authors thank the CHARLS participants and staff for their contributions to data and data collection. The authors are grateful to the i-Research consulting team of Roche for their help to improve the model.
Ethical considerations
The CHARLS data collection was approved by the Ethics Review Committee of Peking University (Ethics Approval No. IRB00001052-11015), and informed consent was obtained from all participants. This study is a secondary analysis of public data, and therefore no additional ethical approval was required.
Authors’ contributions
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
