Development of a machine learning-based depression risk prediction model for middle-aged and elderly Chinese heart disease patients: Evidence from CHARLS data

Abstract

Objective

Heart disease is a leading cause of death and disability among middle-aged and elderly populations. Depression is a common comorbidity that impairs prognosis and quality of life. This study aimed to develop a machine learning (ML)-based depression risk prediction model based on China Health and Retirement Longitudinal Study (CHARLS) data.

Methods

A total of 947 middle-aged and elderly heart disease patients from CHARLS 2015 were included after applying missing data criteria. Missing values were filled using random forest (RF), and data were split 7:3 into training and validation cohorts. Variables selection in the training cohort using univariate analysis, Lasso regression, recursive feature elimination (RFE), and feature importance evaluation using RF and decision tree (DT). Variables appearing in at least three of these five methods were selected. Eleven ML models were constructed and evaluated by area under the curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, F1 score, calibration curve and decision curve analysis. Five-fold cross-validation enhanced stability and SHapley Additive exPlanation (SHAP) values interpreted feature importance.

Results

Fifty-eight variables were extracted. After multi-step variable selection within the training cohort, nine variables (address, grip-max, arthritis rheumatism, Hope, sleep time, pain, Retire, ADL, IADL) were initially identified. Among 11 ML models, the logistic regression (LR) algorithm demonstrated the best overall performance with an AUC of 0.792 in the validation cohort. A 4-variable LR model (pain, address, sleep time, and grip-max) was optimized, achieving a comparable AUC of 0.788. SHAP analysis confirmed pain as the most critical predictor (69.0% of depressed patients reported pain versus 26.9% of non-depressed patients). Rural residence (86.5% vs. 66.7%), shorter sleep time (median 5.25(4.00, 7.00) vs. 6.00(5.00, 8.00) hours), and lower grip-max (24.50(20.00, 30.00) vs. 27.00(22.50, 33.40) increased depression risk. A user-friendly web-based calculator was developed for clinical applications.

Conclusions

The simplified LR model exhibits robust predictive performance and clinical applicability for assessing high depression risk in middle-aged and elderly patients with heart disease.

Keywords

machine learning prediction model depression risk heart disease

1. Background

Heart disease remains a leading cause of death and disability among middle-aged and elderly populations globally, posing a significant public health challenge.¹ Depression, as the most common psychological comorbidity among patients with heart disease, has a significantly higher prevalence compared to the general population. Studies indicate that approximately 30% to 40% of middle-aged and elderly patients with heart disease experience varying degrees of depression.² This comorbidity not only affects patients’ quality of life but also leads to poor prognosis and decreased treatment effectiveness for heart disease.³ Recent studies have shown that depression is significantly related to the risk of heart disease and may exacerbate the condition in patients with heart disease, leading to adverse cardiovascular events.^4,5 For instance, a previous study indicates that patients with severe depression have an approximately 14.7% increased risk of developing heart disease.⁶ Furthermore, depression is closely associated with reduced survival rates in patients with heart disease, and patients’ mental health status significantly impacts their physical health and quality of life.⁷

Despite various interventions targeting heart disease and depression, several challenges persist in clinical practice, including low identification rates, poor intervention efficacy, and low patient compliance. A previous study shows that the prevalence of depression among hospitalized heart failure patients reaches 46.8%, with moderate to severe depression accounting for 11.6%. However, clinical screening rates are below 30.0%.⁸ These challenges make it difficult for clinicians to formulate effective treatment plans for patients with both heart disease and depression. Therefore, early identification and intervention of depression in patients with heart disease are particularly important, especially in the Chinese middle-aged and elderly population.⁹

Previous studies on heart disease and depression have several drawbacks, particularly in factors affecting depressive symptoms in Chinese middle-aged and elderly patients, focusing on single variables, and relying on subjective clinical assessments. However, currently machine learning (ML) algorithms demonstrate significant advantages, with powerful feature extraction and pattern recognition capabilities. They can integrate multidimensional data such as demographics, clinical symptoms, and laboratory indicators, thereby significantly improving the accuracy and stability of predictive models. These algorithms have been successfully applied to predict prognosis and complications of various diseases.¹⁰ Thus, employing ML to construct predictive models for depression risk in heart disease patients will provide new methods for this field.^11,12 CHARLS, as a nationally representative longitudinal survey, covers 28 provinces and includes rich demographic, health-related status, lifestyle, and laboratory testing data, providing a high-quality data source suitable for constructing predictive models intended for the Chinese middle-aged and elderly population.¹³ Health disparities in depression care are well-recognized, with rural populations and socioeconomically disadvantaged groups facing higher burdens of undiagnosed mental health conditions. These inequalities are particularly relevant in the context of heart disease patients, where comorbid depression often remains unrecognized.^14,15

This study aims to utilize the CHARLS 2015 baseline data to integrate multiple potential predictive variables, construct and compare various ML models, screen core predictive factors for depression risk in middle-aged and elderly heart disease patients, and develop simplified and efficient predictive models as well as user-friendly tools to improve the mental health levels of heart disease patients.

2. Methods

2.1. Data source

The data for this study are derived from CHARLS 2015 national baseline data, which was approved by the Ethics Review Committee of Peking University (Ethics Approval No. IRB00001052-11015). CHARLS is a nationally representative longitudinal survey covering 150 districts and counties in 28 provinces of China. The initial sample included 21,112 respondents aged more than 45 years old. The survey encompasses a wide range of content: basic information, economic conditions, family structure, disease history, general health, lifestyle, physical function impairment, cognition, insurance information, retirement security, blood pressure, lung function, grip strength, complete blood count, and biochemical indicators.

After obtaining official authorization access and use the CHARLS 2015 dataset for this research, data was downloaded, and research subjects were screened according to the following criteria: (1) self-reported diagnosis of heart disease; (2) completion of the simplified version of the Center for Epidemiologic Studies Depression Scale (CES-D); (3) no missing data for smoking, drinking, and white blood cell count. A total of 947 heart disease patients were included, of whom 422 (44.6%) were determined to have depressive symptoms (CES-D≥10 points).

To assess whether the sample size was sufficient for developing a reliable prediction model, the events per variable (EPV) criterion was applied. An EPV greater than 10 was considered adequate to prevent overfitting and ensure stable model estimates.¹⁶

2.2. Information collection and definition

2.2.1. Research variables

This study extracted 58 potential predictive variables from the CHARLS database. These variables cover demographic characteristics, health status, lifestyle, and blood test indicators. All predictive variables are defined as follows:

Five demographic variables include gender, age, education level, marital status, and address. Six physical measurement variables include height, weight, body mass index (BMI), obesity, waist circumference, and grip strength. Twenty-two health status, health behavior, and cognitive ability variables include smoking, drinking, activity of daily living (ADL), instrumental activities of daily living (IADL), self-health status, hypertension, dyslipidemia, diabetes mellitus, cancer, liver diseases, chronic lung diseases, stroke, kidney diseases, digestive diseases, arthritis rheumatism, asthma, exercise, sleep time, social interaction, pain, disability, and hope for future life. Two healthcare and health insurance variables include insurance and hospital visits within the past month. One retirement status variable: retirement. Twenty-two blood test-related variables include white blood cell count(WBC); hemoglobin; hematocrit (HCT); mean red cell volume (MCV); platelet count (PLT); triglyceride (TG); creatinine; urea; high-density lipoprotein(HDL); low-density lipoprotein (LDL); total cholesterol; glucose; uric acid (UA); cystatin C (CYSC); C-reactive protein (CRP); glycohemoglobin (HbA1c); triglyceride glucose index; urea to creatinine ratio; uric acid to creatinine ratio; TG to HDL ratio; LDL to HDL ratio; and glucose to glycohemoglobin ratio. All blood test indicators (such as triglycerides, blood glucose, glycosylated hemoglobin, etc.) are based on fasting venous blood tests, with units standardized to international units or clinically common units.

All categorical variables in this study are binary variables. Gender is categorized as male or female. Education level is categorized as “high school or above” and “below high school”. Marital status is classified as married or single. Address is categorized as urban or rural. Self-health status is categorized as good or poor. Based on information collected within a month, hospitalization status is categorized as present or absent. Smoking is defined as smoking once or more per week. Drinking is defined as drinking once or more per week. Engaging in vigorous or moderate activity once or more per week is considered regular physical activity. ADL and IADL are assessed using the ADL scale, which includes 6 items for basic activities of daily living (BADL) and 6 items for IADL. Item scoring is based on the Functional Independence Measure (FIM), where each response is scored as 7.0, 6.0, 4.0, or 1.5 points. These scores are used to calculate the ADL score.¹⁷ A total score below 72 indicates complete or conditional dependence, while a total score of 72 or above indicates complete or conditional independence. Other categorical variables are classified as present or absent according to self-reports.

2.2.2. Depression assessment

Depression was designated as the outcome variable. CHARLS used the CES-D scale, which includes 10 items to assess depressive symptoms. The Chinese version of this scale demonstrated good reliability and validity among the middle-aged and elderly population in China.¹⁸ The total score ranges from 0 to 30, and this study adopted the internationally accepted cutoff, defining a total score≥10 as significant depressive symptoms, which served as the outcome variable.

2.3. Variable selection and model construction

After the predefined inclusion and exclusion criteria were applied, the missing data were imputed using the random forest (RF) method, and then the imputed data were randomly split into a 7:3 ratio, which served as the training cohort (n=663) and validation cohort (n=284). All variable selection procedures were performed exclusively within the training cohort to prevent data leakage.. Within the training cohort, univariate analysis was conducted on the 58 variables. For continuous variables, either the t-test or Mann-Whitney U test was used; for categorical variables, the chi-square test was applied. P<0.05 was considered statistically significant. Second, Lasso regression analysis was performed on the initially screened variables using the R glmnet package; the lambda value minimizing cross-validation error was selected through 10-fold cross-validation to further reduce variables. Third, recursive feature elimination (RFE) was used to assess variable importance, retaining the top 10 ranked variables. Then, the data were analyzed for feature importance using RF variable importance and decision tree (DT) variable importance. Ultimately, variables that appeared in at least three of these five selection methods were included in the preliminary predictive variable cohort. Among highly correlated variables (Pearson correlation coefficient > 0.65), we only selected one based on its known biological association with depression and clinical accessibility.

In the model construction phase, eleven ML models including Naive Bayes (NB), K-nearest neighbors (KNN), logistic regression (LR), random forest (RF), decision tree (DC), artificial neural network (ANN), support vector machine (SVM), gradient boosting (GB), light gradient boosting machine (LGBM), extreme gradient boosting (XGB), and adaptive boosting (AdaBoost) were constructed in training and validation cohort .

2.4. Model evaluation, selection, and simplification

Model performance was evaluated in both the training and validation cohorts. The main evaluation metrics included accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1 score, and AUC. Five-fold cross-validation ROC curves were also generated to compare the stability of different algorithms. Calibration curves were used to assess the consistency between predicted values and observed values. Decision curve analysis (DCA) was utilized to evaluate the clinical utility of the model at different clinical risk thresholds.

After determining the optimal algorithm, we explored model simplification to enhance clinical usability. Based on variable importance ranked by Shapley Additive Explanations (SHAP) values, we employed a backward elimination method, sequentially removing the least important variables and observing changes in AUC on the validation cohort. This process was repeated until only one variable remained. The simplest model was determined by analyzing changes in model diagnostic performance after variable removal.

2.5. Model interpretation and application

Calibration curves and DCA were used to evaluate the model’s calibration and clinical utility. SHAP values were applied to ascertain the variable importance of each predictor and to visualize their association with depression risk of middle-aged and elderly heart disease patients. Variables were ranked according to the magnitude of their SHAP values, with a higher ranking indicating a more significant impact on the outcome. A positive SHAP value indicates a stronger positive effect of the variable on the outcome; a negative SHAP value indicates a stronger negative effect. To facilitate clinical translation and large-scale screening, we developed a web-based risk calculator based on the final simplified LR model. The calculator was implemented using the Streamlit framework (Python 3.9) and deployed on a cloud platform.

2.6. Statistical analysis

Continuous variables were first assessed for normality, using the Shapiro-Wilk test. Data that met normal distribution and homogeneity of variance were expressed as mean ± standard deviation, with independent samples t-test used for comparisons between groups. Non-normal or heteroscedastic data were expressed as median (interquartile range), with the Mann-Whitney U test used for comparisons between groups. Categorical variables were expressed as frequency (percentage), with the chi-square test used for comparisons between groups. ML models were crafted using MyLab+ i-Research. All statistical analyses were conducted using R 4.3.3 software, with a two-sided P-value < 0.05 considered statistically significant.

3. Results

3.1. Demographic and clinical characteristics

Through inclusion and exclusion criteria, a total of 947 heart disease patients were included in the analysis (Figure 1), of whom 422 were in the depression group (44.6%) and 525 in the non-depression group (55.4%). As shown in Table 1, 28 variables had significant differences between the depression and non-depression groups.

Figure 1.

Flow chart of screening research objects.

Table 1.

Comparison of data between non-depression group and depression group in middle-aged and elderly patients with heart disease.

Characteristics	Non-depression (n=525)	Depression (n=422)	p-value
Age	63.00 (55.00, 69.00)	61.00 (54.00, 68.00)	0.023
Gender (n%)			<0.001
Male	120 (22.9%)	41 (9.7%)
Female	405 (77.1%)	381 (90.3%)
Education level (n%)			0.069
High school and higher	443 (84.4%)	337 (79.9%)
Below high school	82 (15.6%)	85 (20.1%)
Marital status (n%)			0.141
Married	449 (85.5%)	346 (82.0%)
Single	76 (14.5%)	76 (18.0%)
Address (n%)			<0.001
Rural	350 (66.7%)	365 (86.5%)
Urban	175 (33.3%)	57 (13.5%)
Height (cm)	155.20 (151.22, 161.17)	154.05 (149.50, 158.30)	<0.001
Weight	61.30 (54.90, 69.23)	59.75 (51.90, 68.08)	0.005
BMI	25.20 (22.71, 27.62)	24.82 (22.49, 26.74)	0.305
Obesity (n%)			0.218
Yes	330 (62.9%)	245 (58.1%)
No	188 (37.1%)	165 (41.9%)
Waist (cm)	89.30 (82.950, 97.35)	88.20 (81.00, 96.60)	0.127
Grip (kg)	27.00 (22.50, 33.40)	24.50 (20.00, 30.00)	<0.001
Smoking (n%)			0.535
No	490 (93.3%)	398 (94.3%)
Yes	35 (6.7%)	24 (5.7%)
Drinking (n%)			0.002
No	405 (77.1%)	360 (85.3%)
Yes	120 (22.9%)	62 (14.7%)
ADL (n%)			<0.001
No	425 (97.9%)	376 (93.1%)
Yes	9 (2.1%)	28 (6.9%)
IADL (n%)			<0.001
No	486 (93.6%)	346 (83.2%)
Yes	33 (6.4%)	70 (16.8%)
Self-health status (n%)			<0.001
Fair	450 (85.7%)	396 (93.8%)
Good	75 (14.3%)	26 (6.2%)
Hypertension (n%)			0.682
No	305 (59.5%)	248(60.8%)
Yes	208 (40.5%)	160 (39.2%)
Dyslipidemia (n%)			0.271
Yes	130 (24.9%)	118 (28.0%)
No	393 (75.1%)	303 (72.0%)
Diabetes mellitus(n%)			0.952
No	417 (79.7%)	335 (79.6%)
Yes	106 (20.3%)	86 (20.4%)
Cancer (n%)			0.316
No	509 (97.0%)	404 (95.7%)
Yes	16 (3.0%)	18 (4.3%)
Liver diseases (n%)			0.497
No	243 (91.0%)	120 (88.9%)
Yes	24 (9.0%)	15 (11.1%)
Chronic lung diseases (n%)			<0.001
No	453 (86.3%)	322 (76.3%)
Yes	72 (13.7%)	100 (23.7%)
Stroke (n%)			0.466
No	507 (96.6%)	411 (97.4%)
Yes	18 (3.4%)	11 (2.6%)
Kidney diseases (n%)			<0.001
No	477 (90.9%)	334 (79.1%)
Yes	48 (9.1%)	88 (20.9%)
Digestive diseases (n%)			<0.001
No	350 (66.7%)	211 (50.0%)
Yes	175 (33.3%)	211 (50.0%)
Arthritis rheumatism (n%)			<0.001
No	307 (58.5%)	152 (36.0%)
Yes	218 (41.5%)	270 (64.0%)
Asthma (n%)			<0.001
No	503 (95.8%)	381 (90.3%)
Yes	22 (4.2%)	41 (9.7%)
Exercise (n%)			0.037
No	404 (77.0%)	348 (82.5%)
Yes	121 (23.0%)	74 (17.5%)
Sleep time (h)	6.00 (5.00, 8.00)	5.25 (4.00, 7.00)	<0.001
Social (n%)			<0.001
No	182 (34.7%)	195 (46.2%)
Yes	343 (65.3%)	227 (53.8%)
Pain (n%)			<0.001
No	384 (73.1%)	131 (31.0%)
Yes	141 (26.9%)	291 (69.0%)
Disability (n%)			0.109
No	401 (76.4%)	303 (71.8%)
Yes	124 (23.6%)	119 (28.2%)
Hope (n%)			<0.001
No	134 (25.5%)	193 (45.7%)
Yes	391 (74.5%)	229 (54.3%)
Insurance (n%)			0.441
No	40 (7.6%)	38(9.0%)
Yes	485 (92.4%)	384 (91.0%)
Hospital (n%)			0.002
No	383 (73.0%)	268 (63.5%)
Yes	142 (27.0%)	154 (36.5%)
Retire (n%)			<0.001
No	370 (70.5%)	381 (90.3%)
Yes	155 (29.5%)	41 (9.7%)
WBC (×10⁹/L)	5.70 (4.70, 6.70)	5.67 (4.70, 6.80)	0.710
Hemoglobin (g/dL)	13.40 (12.50, 14.50)	13.20 (12.23, 14.18)	0.012
Hct (%)	40.60 (37.80, 44.00)	40.15 (37.43, 43.00)	0.018
MCV (fl)	91.50 (88.0, 94.9)	91.35 (88.00, 96.40)	0.890
PLT (×10⁹/L)	209.00 (172.00, 243.00)	207.00 (163.00, 254.00)	0.895
Triglyceride (mg/dL)	130.97 (92.04, 177.88)	127.43 (93.81, 178.76)	0.750
Creatinine (mg/dL)	0.72 (0.63, 0.85)	0.71 (0.63, 0.80)	0.027
Urea (mg/dL)	14.57 (12.33, 17.65)	14.57 (12.33, 18.21)	0.555
HDL (mg/dL)	49.42 (42.09, 56.37)	49.21 (43.63, 57.53)	0.294
LDL (mg/dL)	103.86 (83.40, 122.59)	104.25 (88.03, 125.48)	0.275
Cholesterol (mg/dL)	187.64 (164.09, 212.74)	190.35 (166.8, 214.29)	0.183
Glucose (mg/dL)	95.50 (88.29, 106.31)	95.50 (88.29, 104.50)	0.104
Uric acid (mg/dL)	4.80 (4.00, 5.70)	4.50 (3.80, 5.30)	0.0005
Cystatin C (mg/L)	0.86 (0.73, 0.98)	0.84 (0.74, 0.96)	0.338
CRP (mg/L)	1.60 (0.80, 2.80)	1.50 (0.80, 2.80)	0.772
HbA1c (%)	6.00 (5.60, 6.30)	5.90 (5.60, 6.20)	0.076
TyG Index	8.76 (8.36, 9.14)	8.72 (8.36, 9.14)	0.509
Urea/Crea	19.85 (16.81, 23.56)	20.73 (16.80, 25.16)	0.015
UA/Crea	6.54 (5.53, 7.56)	6.35 (5.32, 7.46)	0.102
TG/HDL	2.64 (1.73, 4.00)	2.55 (1.51, 3.889	0.558
LDL/HDL	2.05 (1.68, 2.50)	2.09 (1.72, 2.54)	0.472
Glu/HbA1C	0.91 (0.84, 0.98)	0.90 (0.84, 0.98)	0.260

BMI, body mass index; ADL, activity of daily living; IADL, instrumental activities of daily living; WBC, white blood cell; MCV, mean corpuscular volume; HDL, high density lipoprotein; LDL, low density lipoprotein; CRP, C-reaction protein; HbA1c, glycated hemoglobin; TyG Index, Triglyceride-Glucose Index. Values in bold indicate P<0.05.

3.2. Variable selection

Following univariate analysis, Lasso regression, RFE screening, feature importance selected using RF and DT performed exclusively within the training cohort, nine variables that appeared in at least three of the five selection methods were ultimately selected for the model construction phase (Figure 2), including address (rural/urban), grip (kg), arthritis rheumatism (yes/no), ADL(yes/no), IADL(yes/no), sleep time (hours), pain (yes/no), Hope (yes/no) and Retire (yes/no).

Figure 2.

Variable selection for the construction of machine learning models: (a) The venn diagram of variable screening processes using univariate analysis, Lasso, REF, RF and DT in the training cohort. (b) Pearson correlation analysis.

3.3. Model construction and performance

The predictive performance of 11 ML algorithms based on the nine selected variables was evaluated in both the training and validation cohorts (Table 2). XGB exhibited severe overfitting, with a training AUC of 0.993 but a validation AUC of only 0.703 and was therefore directly excluded. GB, KNN, DT, RF and NB were also excluded due to relatively low or unstable validation performance. The remaining candidate algorithms were LR, SVM, ANN, LGBM, and AdaBoost.

Table 2.

Performance of the eleven machine learning algorithms in the training and validation cohorts for depression patients.

Model	Accuracy	PPV	NPV	F1_score	Sensitivity	Specificity	AUC (95% CI)
Training
NB	0.743	0.706	0.774	0.744	0.725	0.757	0.797 (0.764-0.831)
KNN	0.743	0.771	0.729	0.738	0.603	0.856	0.819 (0.788-0.848)
LR	0.752	0.715	0.784	0.753	0.739	0.763	0.809 (0.777-0.840)
RF	0.763	0.746	0.775	0.762	0.708	0.807	0.830 (0.800-0.859)
DC	0.749	0.714	0.778	0.749	0.729	0.766	0.821 (0.791-0.849)
ANN	0.785	0.765	0.802	0.785	0.749	0.815	0.844 (0.812-0.873)
SVM	0.748	0.739	0.754	0.746	0.671	0.809	0.810 (0.776-0.843)
GB	0.844	0.824	0.861	0.844	0.827	0.858	0.914 (0.892-0.935)
LGBM	0.760	0.748	0.768	0.759	0.695	0.812	0.833 (0.801-0.862)
XGB	0.962	0.969	0.957	0.962	0.946	0.975	0.993 (0.989-0.997)
AdaBoost	0.749	0.722	0.771	0.749	0.712	0.779	0.815 (0.783-0.846)
Validation
NB	0.709	0.641	0.791	0.709	0.787	0.646	0.773 (0.716-0.826)
KNN	0.660	0.610	0.705	0.660	0.654	0.665	0.729 (0.669-0.786)
LR	0.726	0.652	0.823	0.726	0.827	0.646	0.792 (0.741-0.845)
RF	0.712	0.632	0.833	0.710	0.850	0.601	0.784 (0.734-0.837)
DC	0.674	0.667	0.678	0.668	0.535	0.785	0.745 (0.690-0.799)
ANN	0.733	0.671	0.801	0.734	0.787	0.690	0.774 (0.725-0.828)
SVM	0.730	0.651	0.840	0.729	0.850	0.633	0.796 (0.744-0.846)
GB	0.716	0.692	0.733	0.715	0.654	0.766	0.760 (0.700-0.816)
LGBM	0.726	0.667	0.790	0.727	0.772	0.690	0.785 (0.729-0.840)
XGB	0.677	0.697	0.668	0.666	0.488	0.829	0.703 (0.641-0.763)
AdaBoost	0.726	0.696	0.750	0.726	0.685	0.759	0.773 (0.718-0.829)

PPV, positive predictive value; NPV, negative predictive value; AUC, the area under the curve; CI, confidence interval; NB, Naive-Bayes; KNN, K-nearest neighbor; LR, logistic regression; RF, random forest; DC, decision tree; ANN, artificial neural networks; SVM, support vector machine; GB, gradient boosting; LGBM, light gradient boosting machine; XGB, extreme gradient boosting; AdaBoost, adaptive boosting.

Further evaluation using validation cohort ROC curves and 5-fold cross-validation ROC curves revealed that LR achieved a validation AUC of 0.792 (95% CI: 0.741–0.845), with cross-validation curves showing stable clustering and no significant fluctuations (Figure 3(a) and (b)). Although SVM achieved a slightly higher validation AUC of 0.796, its reliance on kernel functions and hyperparameters resulted in poor interpretability, making it less suitable for clinical application. ANN showed relatively low validation AUC with large cross-validation fluctuations. LGBM and AdaBoost were both inferior to LR in terms of AUC and stability. Consequently, SVM, ANN, LGBM, and AdaBoost were excluded, leaving LR as the final optimal algorithm.

Figure 3.

Model construction and evaluation. (a) ROC Curves of 11 algorithms in the Validation cohort; (b) ROC curves of 11 algorithms plotted by 5-fold cross-validation; (c) Calibration curve of LR algorithm in the validation cohort; (d) Decision curves of LR in the validation cohort.

Finally, LR was comprehensively evaluated and demonstrated optimal performance in calibration curves for predicted-observed agreement and decision curve analysis (DCA) for net clinical benefit, fully meeting the requirements for a clinical prediction model (Figure 3(c) and (d)).

3.4. Model simplification

To develop a more simplified model version for clinical applications, we ranked the variables in the LR model and sequentially removed the least important variables. As shown in Table 3, when the number of variables decreased from 8 to7,6, 5, and 4, the AUCs on the validation cohort were 0.792, 0.788, 0.785, 0.787 and 0.788, respectively. The 4-variable combination (pain, address, sleep time and grip-max) achieved an AUC (0.788), which was comparable to the full 9-variable model. Considering the model’s stability, interpretability, and predictive performance, we ultimately retained the LR model with 4 variables as the final recommended model.

Table 3.

Performance comparison of LR algorithm with different feature numbers.

	Accuracy	PPV	NPV	F1_score	Sensitivity	Specificity	AUC (95%CI)
Train cohort
8 features	0.740	0.727	0.749	0.739	0.668	0.798	0.794 (0.758-0.827)
7 features	0.740	0.737	0.742	0.738	0.647	0.815	0.791 (0.756-0.824)
6 features	0.734	0.703	0.759	0.734	0.698	0.763	0.785 (0.749-0.822)
5 features	0.721	0.698	0.737	0.720	0.658	0.771	0.779 (0.743-0.813)
4 features	0.725	0.733	0.721	0.721	0.603	0.823	0.776 (0.741-0.809)
3 features	0.724	0.712	0.731	0.722	0.637	0.793	0.767 (0.729-0.803)
2 features	0.721	0.687	0.747	0.720	0.685	0.749	0.744 (0.705-0.780)
1 feature	0.554	0.000	0.554	0.395	0.000	1.000	0.709 (0.676-0.743)
Validation cohort
8 features	0.740	0.696	0.780	0.741	0.740	0.741	0.792 (0.739-0.844)
7 features	0.733	0.678	0.789	0.734	0.764	0.709	0.788 (0.735-0.836)
6 features	0.716	0.642	0.813	0.715	0.819	0.633	0.785 (0.735-0.835)
5 features	0.737	0.671	0.812	0.737	0.803	0.684	0.787 (0.730-0.841)
4 features	0.744	0.680	0.815	0.744	0.803	0.696	0.788 (0.733-0.840)
3 features	0.733	0.714	0.747	0.732	0.669	0.785	0.786 (0.730-0.839)
2 features	0.726	0.709	0.738	0.725	0.654	0.785	0.767 (0.712-0.822)
1 feature	0.554	0.000	0.554	0.395	0.000	1.000	0.714 (0.662-0.762)

PPV, positive predictive value; NPV, negative predictive value; AUC, the area under the curve;CI, confidence interval.

The final simplified model contained 4 predictors. With a total of 422 depression events in the full sample (n=947) and 296 events in the training cohort (n=663), the resulting EPV was 105.5 and 74.0, respectively, both substantially exceeding the recommended minimum of 10. This confirms that the sample size was adequate for model development and internal validation.

3.5. Model interpretability and application

The calibration curve and DCA for the final 4-variable LR model (pain, address, sleep time, and grip-max) are presented in Figure 4(a) and (b), respectively. The calibration curve demonstrated good agreement between predicted depression risk and observed outcomes, indicating that the model is well-calibrated. The DCA showed that the 4-variable LR model provides positive net clinical benefit across a wide range of reasonable threshold probabilities, supporting its clinical utility for depression risk screening in middle-aged and elderly heart disease patients.

Figure 4.

The evaluation, interpretability and application of the 4-variable LR model. (a) The calibration curve for the final 4-variable LR model; (b) The decision curves for the final 4-variable LR model; (c) SHAP plot of the 4-variable LR model; (d) Screenshot of the web-based calculator.

SHAP values were used to interpret the final 4-variable LR model (Figure 4(c)). The results showed that pain is the most important factor in predicting depression risk, with the SHAP summary plot indicating that pain presence significantly increases this risk. Next, residing in rural areas and having shorter sleep time were both associated with a higher risk of depression. Lower grip strength (grip-max) also emerged as a risk factor.

Based on these findings, we developed a user-friendly web-based calculator for clinical application. The calculator is accessible at https://bgmyndhqg6cnnlssqkr9op.streamlit.app/ (Figure 4(d)).

4. Discussion

The comorbidity of cardiovascular disease and depressive disorders poses a significant challenge in global public health and clinical practice.¹⁹ This study, based on nationally representative data from CHARLS, integrated multidimensional variables for the first time and constructed 11 ML models for predicting the risk of depression in middle-aged and elderly patients with heart disease. By performing variable selection strictly within the training cohort to prevent data leakage and applying a more inclusive missing data criterion, we obtained a larger and more representative sample (n=947). Through variable selection and model simplification, we ultimately obtained a high-performing and clinically practical 4-variable LR model. The findings not only revealed the key factors influencing the risk of depression in this population but also provided a web-based calculator tool for early screening of clinical depression risk.

SHAP value analysis showed that pain symptoms are the most important factor predicting depression risk in middle-aged and elderly heart disease patients. Notably, 69.0% of patients in the depression group reported frequent pain, compared to only 26.9% in the non-depression group (Table 1). This result aligns with existing research showing a close bidirectional association between chronic pain and depression. Pain triggers psychological stress through persistent physiological discomfort, leading to depressive symptoms such as low mood and loss of interest, while depression can amplify pain perception thresholds and reduce pain coping abilities, worsening the pain experience.²⁰ Therefore, for heart disease patients, chronic pain such as chest pain and joint pain is common; clinicians should pay attention to the psychological state of patients while managing their pain symptoms, and timely screening for depression risk is necessary. Living in rural areas is another key risk factor, with the proportion of rural residents in the depression group (86.5%) significantly higher than that in the non-depression group (66.7%) (Table 1). This may be related to factors such as the relative scarcity of medical resources in rural areas, low household economic status, and weak social support networks.²¹ This study also found that patients in the depressed group had shorter sleep time, which aligns with the bidirectional impact mechanism between sleep disorders and depression: insufficient sleep can disrupt neurotransmitter balance like serotonin and dopamine, triggering depressive moods, while depressive symptoms can lead to difficulties in falling asleep and decreased sleep quality through anxiety and rumination.²² Lower grip strength (grip-max) was identified as a risk factor for depression in this study. Grip strength is a well-established indicator of overall muscle function and physical frailty.²³ Previous research has shown that reduced grip strength is associated with increased risk of depression in older adults, possibly due to decreased physical activity, reduced social participation, and underlying inflammatory processes.²⁴ This finding suggests that assessing grip strength, a simple and inexpensive measurement, could aid depression risk screening among heart disease patients. Consequently, clinical attention should be paid to the psychological needs of patients with comorbidities.

This study is based on the CHARLS database, which provides authoritative data from a nationally representative sample using a stratified, multi-stage proportional sampling method. This provides a good external validity basis for generalizing the research conclusions to a broader population of heart disease patients in China. Furthermore, compared to traditional predictive models, the ML model constructed in this study has significant advantages. The research strictly follows predictive model construction standards, by using univariate analysis, Lasso regression, recursive feature elimination, RF and DT for variable selection, which ensures the reliability of the selected variables. Based on performance in independent validation cohorts, this study selected the more robust LR algorithm and supplemented it with SHAP analysis for in-depth interpretation. We found that the LR model achieved an AUC of 0.792 (95%CI: 0.741-0.845), while the simplified 4-variable model was 0.788(95%CI: 0.733-0.840). This simplification of the model provides a feasible solution for clinical practice. The development of the web-based risk calculator further enhances the model’s practicality. It allows clinicians and researchers to quickly obtain depression risk assessment results by inputting the 4 core clinical indicators of patients, providing an efficient tool for large-scale screening and epidemiological investigations. Targeted intervention programs could be developed based on this predictive model, such as cognitive behavioral therapy and psychological counseling for high-risk patients. The effectiveness of these interventions could then be assessed to provide more comprehensive support for the mental health management of middle-aged and elderly heart disease patients.

This study has several limitations. First, although we increased the sample size by relaxing missing data criteria, the sample is still drawn from a single wave of CHARLS, and the lack of external validation limits the generalizability and clinical applicability of the model. Future multi-center external validation is needed to improve generalizability. Moreover, some potentially important variables, such as levels of social support, economic income, and coping strategies, were not included in the analysis, which may affect the model’s predictive accuracy. Future research could further expand the sample size; incorporate longitudinal follow-up data to validate the model’s long-term predictive performance; integrate multi-omics data to improve predictive accuracy; and enhance the model’s generalization ability through multi-center external validation.

Additionally, targeted intervention programs could be developed based on this predictive model, such as cognitive behavioral therapy and psychological counseling for high-risk patients. The effectiveness of these interventions could then be assessed to provide more comprehensive support for the mental health management of middle-aged and elderly heart disease patients.

For clinical implementation, the web-based calculator requires all four predictors (pain, address, sleep time, grip-max) as inputs. If any predictor value is missing or of poor quality, the model should not be used until valid data are obtained; imputation is not recommended for clinical decision-making. The calculator is designed for healthcare professionals (physicians or nurses) and requires no statistical expertise, as users simply enter the four values and receive an automatic risk prediction. The tool is not intended for patient self-use without clinical supervision.

5. Conclusion

This study constructed and simplified a LR predictive model for depression risk in middle-aged and elderly heart disease patients based on CHARLS data. Core predictive variables include pain, address, sleep time and grip-max, with the model demonstrating good predictive efficacy and clinical practicality. The promotion and application of this model and the accompanying web calculator will aid in the early identification of high-risk depression patients in clinical settings, facilitate timely intervention, help break the vicious cycle of comorbidity between heart disease and depression, and improve overall patient prognosis.

Footnotes

Acknowledgements

The authors thank the CHARLS participants and staff for their contributions to data and data collection. The authors are grateful to the i-Research consulting team of Roche for their help to improve the model.

ORCID iD

Guangzhen Fu

Ethical considerations

The CHARLS data collection was approved by the Ethics Review Committee of Peking University (Ethics Approval No. IRB00001052-11015), and informed consent was obtained from all participants. This study is a secondary analysis of public data, and therefore no additional ethical approval was required.

Authors’ contributions

Guangzhen Fu and Yuhan Shen: Writing–original draft, Writing–review & editing, Methodology, Conceptualization.

Jingjing Yang and Yang Li: Validation, Software, Methodology.

Tuanjie Huang and Liqiu Yang: Writing–review & editing. Validation.

Xianchao Yang and Junwei Zhao: Writing–original draft, Writing–review & editing, Visualization, Validation, Software, Methodology, Conceptualization.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data used in this study were sourced from CHARLS (2015). The data is publicly accessible and can be downloaded from the CHARLS website: .

Appendix

References

Baratta

Moscucci

Lospinuso

, et al. Lipid-Lowering Therapy and Cardiovascular Prevention in Elderly. Drugs 2025; 85: 801–812. https://doi.org/10.1007/s40265-025-02182-0

Zhang

Hua

Liu

. [Joint association of depression symptoms and 10-year risk of ischemic cardiovascular disease with the cardiovascular disease in middle-aged and elderly people in China]. Beijing Da Xue Xue Bao Yi Xue Ban 2023; 55: 465–470. https://doi.org/10.19723/j.issn.1671-167X.2023.03.012

Krittanawong

Maitra

Qadeer

, et al. Association of Depression and Cardiovascular Disease. Am J Med 2023; 136: 881–895. https://doi.org/10.1016/j.amjmed.2023.04.036

Zhai

Shi

, et al. Depression and coronary heart disease: mechanisms, interventions, and treatments. Front Psychiatry 2024; 15: 1328048. https://doi.org/10.3389/fpsyt.2024.1328048

Chen

Liao

Chattopadhyay

, et al. Exploring the genetic correlation of cardiovascular diseases and mood disorders in the UK Biobank. Epidemiol Psychiatr Sci 2023; 32: e31. https://doi.org/10.1017/s2045796023000252

Chen

You

, et al. Causal association between major depressive disorder and coronary heart disease: a two-sample bidirectional mendelian randomization study. BMC Med Genomics 2023; 16: 183. https://doi.org/10.1186/s12920-023-01625-5

Zhou

Wang

, et al. Cardiovascular disease and depression: a narrative review. Front Cardiovasc Med 2023; 10: 1274595. https://doi.org/10.3389/fcvm.2023.1274595

Duan

Huang

Zhang

, et al. Role of depressive symptoms in the prognosis of heart failure and its potential clinical predictors. ESC Heart Fail 2022; 9: 2676–2685. https://doi.org/10.1002/ehf2.13993

Fagiolini

González-Pinto

Miskowiak

, et al. Role of trazodone in treatment of major depressive disorder: an update. Ann Gen Psychiatry 2023; 22: 32. https://doi.org/10.1186/s12991-023-00465-y

10.

Chen

Liu

, et al. Entry point of machine learning in axial spondyloarthritis. RMD Open 2024; 10: e003832. https://doi.org/10.1136/rmdopen-2023-003832

11.

Hou

, et al. Development and external validation of a risk prediction model for depression in patients with coronary heart disease. J Affect Disord 2024; 367: 137–147. https://doi.org/10.1016/j.jad.2024.08.218

12.

Wang

, et al. Development and validation of a prediction model for coronary heart disease risk in depressed patients aged 20 years and older using machine learning algorithms. Front Cardiovasc Med 2024; 11: 1504957. https://doi.org/10.3389/fcvm.2024.1504957

13.

Yang

Zhang

. A Multicohort Machine Learning Framework to Predict Mortality in Elderly Patients With Heart Disease: Insights From HARLS, SHARE, and HRS. Cardiovasc Ther 2026; 2026: 8040700. https://doi.org/10.1155/cdr/8040700

14.

Zhang

Han

, et al. Predictive potential of somatic symptoms for the identification of subthreshold depression and major depressive disorder in primary care settings. Front Psychiatry 2023; 14: 999047. https://doi.org/10.3389/fpsyt.2023.999047

15.

Moitra

Santomauro

Collins

, et al. The global gap in treatment coverage for major depressive disorder in 84 countries from 2000-2019: A systematic review and Bayesian meta-regression analysis. PLoS Med 2022; 19: e1003901. https://doi.org/10.1371/journal.pmed.1003901

16.

Riley

Ensor

Snell

KIE

, et al. Calculating the sample size required for developing a clinical prediction model. Bmj 2020; 368: m441. https://doi.org/10.1136/bmj.m441

17.

Zhang

Wang

, et al. [Study on activities of daily living disability in community-dwelling older adults in China]. Zhonghua Liu Xing Bing Xue Za Zhi 2019; 40: 266–271. https://doi.org/10.3760/cma.j.issn.0254-6450.2019.03.003

18.

Tian

Meng

Qiu

. Childhood adversities and mid-late depressive symptoms over the life course: Evidence from the China health and retirement longitudinal study. J Affect Disord 2019; 245: 668–678. https://doi.org/10.1016/j.jad.2018.11.028

19.

Shu

Xie

Gao

, et al. Association of depressive symptoms with non-fatal cardiovascular disease in middle-aged and elderly patients with hypertension: a cohort study from China. BMJ Open 2025; 15: e087905. https://doi.org/10.1136/bmjopen-2024-087905

20.

Werneck

Stubbs

. Bidirectional relationship between chronic pain and depressive symptoms in middle-aged and older adults. Gen Hosp Psychiatry 2024; 89: 49–54. https://doi.org/10.1016/j.genhosppsych.2024.05.007

21.

Zeng

. Family socioeconomic status and adolescent depression in urban and rural China: A trajectory analysis. SSM Popul Health 2024; 25: 101627. https://doi.org/10.1016/j.ssmph.2024.101627

22.

Lee

Baek

Shin

, et al. Alleviation of Immobilization Stress or Fecal Microbiota-Induced Insomnia and Depression-like Behaviors in Mice by Lactobacillus plantarum and Its Supplement. Nutrients 2024; 16: 3711. https://doi.org/10.3390/nu16213711

23.

Ribeiro

Berndt

Mielke

, et al. Factors associated with handgrip strength across the life course: A systematic review. J Cachexia Sarcopenia Muscle 2024; 15: 2270–2280. https://doi.org/10.1002/jcsm.13586

24.

Marín-Jiménez

Bizzozero-Peroni

Molina-Garcia

, et al. Clinical importance of simple muscular fitness tests to predict long-term health conditions: a systematic review and meta-analysis of 94 cohort studies. Br J Sports Med 2026; 60: 465–483. https://doi.org/10.1136/bjsports-2024-109173