Mortality risk prediction of the electrocardiogram as an informative indicator of cardiovascular diseases

Abstract

Background

The electrocardiogram (ECG) may be the most popular test in the management of cardiovascular disease (CVD). Although wide applications of artificial intelligence (AI)-enabled ECG have been developed, an integrating indicator for CVD risk stratification was not investigated. Since mortality may be the most important global outcome, this study aimed to develop a survival deep learning model (DLM) to establish a critical ECG value and explore the associations with various CVD events.

Methods

We trained a DLM with 451,950 12-lead resting ECGs obtained from 210,552 patients, for whom 23,592 events occurred. The internal validation set included 27,808 patients with one ECG for each patient. The external validations were performed in a community hospital with 33,047 patients and two transnational data sets with 233,647 and 1631 ECGs. We distinguished the cause of mortality and additionally investigated CVD-related outcomes, including new-onset acute myocardial infarction (AMI), stroke (STK), and heart failure (HF).

Results

The DLM achieved C-indices of 0.858/0.836 in internal/external validation sets by using ECG over a 10-year period. The high-mortality-risk group identified by the proposed DLM presented a hazard ratio (HR) of 14.16 (95% confidence interval (CI): 11.33–17.70) compared to the low-risk group in the internal validation and presented a higher risk of cardiovascular (CV) mortality (HR: 18.50, 95% CI: 9.82–34.84), non-CV mortality (HR: 13.68, 95% CI: 10.76–17.38), AMI (HR: 4.01, 95% CI: 2.24–7.17), STK (HR: 2.15, 95% CI: 1.70–2.72), and HF (HR: 6.66, 95% CI: 4.54–9.77), which was consistent in an independent community hospital. The transnational validation also revealed HRs of 4.91 (95% CI: 2.63–9.16) and 2.29 (95% CI: 2.15–2.44) for all-cause mortality in the SaMi-Trop and Clinical Outcomes in Digital Electrocardiography 15% (CODE15) cohorts.

Conclusions

The mortality risk by AI-enabled ECG may be applied in passive electronic-health-record-based CVD risk screening, which may identify more asymptomatic and unaware high-risk patients.

Keywords

Artificial intelligence electrocardiogram deep learning mortality cardiovascular disease risk stratification electronic health record

Introduction

Cardiovascular disease (CVD) may be the most important public health issue and is the leading cause of death worldwide.¹ If a patient has the presence of CVD, CVD-related deaths may be prevented by extensive health management through effective diet plans, lifestyle interventions, and drug interventions.² A great deal of research on CVD risk estimation has been performed over the past decade, including the Framingham risk scores,^3,4 the QRISK score,⁵ the Europe Systematic Coronary Risk Evaluation (SCORE),⁶ the Assessing Cardiovascular Risk using Scottish Intercollegiate Guidelines Network (ASSIGN) score,⁷ the Prospective Cardiovascular Master (PROCAM) equations,⁸ and the CUORE cohort study formula.⁹ These CVD risk prediction models screened patients with CVD and high-risk groups to facilitate CVD control.^10,11

Currently, background disease screening and alarm systems have become popular using electronic health records (EHRs).^12,13 While active decision support interventions can be embedded into EHRs, passive notification can also be embedded without the actions of clinicians. These concepts can be used to actively prompt clinicians to notice asymptomatic CVD cases.¹⁰ Nevertheless, there are still numerous challenges associated with developing CVD risk estimation models. The blood sample is currently the major component driving these risk stratification calculators,¹⁴ often missing because of their intrusiveness. These risk scores were available only for less than 30% of the patients in an EHR-based cardiovascular (CV) screening.¹⁵ An accurate passive CVD risk stratification system using less information is an unmet need in clinical practice.

With deep learning model (DLM)-enhanced medical interpretations, previous studies have found plentiful patient information from chest X-ray,¹⁶ fundoscopy,¹⁷ and electrocardiogram (ECG)¹⁸ beyond expert knowledge. Since ECG may be the most popular examination in clinical practice, the DLM-enabled ECG system may provide an opportunity to stratify high-risk groups of CVD with a single examination. A number of applications of artificial intelligence (AI) have been developed to detect cardiac diseases using large annotated ECG data sets.^19,20 However, an initial risk stratification tool in primary care may need an integrating indicator for CVD risk stratification. Currently, ECG age might be the best indicator since it is associated with extensive CVD outcomes.^21,22

Mortality may be the most important global outcome in CVD-related prognosis. DLM could automatically predict 1-year mortality directly using 12-lead ECG voltage-time data in a previous study.²³ Therefore, we hypothesized that the mortality risk estimated by DLM-enabled ECG may be a better integrating indicator than ECG age. In this study, we explored the associations between wide CVD-related prognosis and ECG mortality risk to establish an accurate passive CVD risk stratification system. Moreover, although a previous study investigated the 25-year outcomes of models trained with 1 year of deaths,²³ the more standard method may be to directly use survival analysis for long-term effects. This study also provided a survival DLM to address this issue and compared the performance with the risk score estimated by a 1-year mortality model.

Method

Data source and population

The institutional ethics committee of the Tri-Service General Hospital (C202105049) reviewed and approved this study, and we retrospectively developed and evaluated a DLM internally and externally. Since we retrospectively used de-identified data collected and encrypted from the hospital to the data controller, an informed consent waiver was granted for this study. The ECGs were collected from two hospitals, an academic medical center in Neihu District (Hospital A) and a community hospital in Zhongzheng District (Hospital B), from 1 January 2010 to 30 April 2021. Patients aged less than 20 years old were excluded.

Figure 1 shows the assignment of samples in this study. There were 291,778 patients with at least one ECG in Hospital A. The 210,552 patients admitted to Hospital A after 1 January 2017 were used in the development set, which included 451,950 ECG records for DLM training. A total of 20,371 patients were assigned to the tuning set between 1 January 2016 and 31 December 2016. These patients provided 89,302 ECGs for guiding the training process and determining the optimal operating point for subsequent usage. Finally, 27,808 patients before 31 December 2015 were assigned to an internal validation set, which contained only the first ECGs that were used for the accuracy test and follow-up analysis. We also collected 33,047 patients in Hospital B using the same inclusion criteria as the internal validation set to verify the extrapolation of the DLM. We also used two data sets, the Clinical Outcomes in Digital Electrocardiography 15% (CODE15) cohort (with 233,647 participants) and the SaMi-Trop cohorts (with 1631 participants), to perform international validation. SaMi-Trop is a National Institutes of Health-funded prospective cohort with chronic Chagas cardiomyopathy to evaluate whether a clinical prediction rule based on ECG and other biomarkers can be useful in clinical practice.^24,25 The CODE15 is a subset of the CODE cohort,²⁶ which was developed with the database of digital ECG exams of the TeleHealth Network of Minas Gerais (TNMG), Brazil, linked to the public databases of the Mortality and Hospitalization Information Systems.^27,28 These data sets provided ECG age as the baseline marker compared to our mortality risk assessment model. Moreover, the ECGs in these transnational cohorts were divided into normal and abnormal according to the American Heart Association, American College of Cardiology Foundation, and Heart Rhythm Society.²⁹

Figure 1.

Development, tuning, internal validation, and external validation set generation and ECG labeling of survival information in a private data set. Schematic of the data set creation and analysis strategy, which was devised to ensure a robust and reliable data set for training, validating, and testing of the network. Once a patient's data were placed in one of the data sets, that individual's data were used only in that set, avoiding “cross-contamination” among the training, validation, and test data sets. The details of the flow chart and how each of the data sets was used are described in the methods.

Data collection

The 12-lead ECG was recorded with a 500-Hz frequency and 10 s, and we directly used raw ECG traces to train DLMs. Patterns included abnormal T wave, atrial fibrillation, atrial flutter, atrial premature complex, complete AV block, complete left bundle branch block, complete right bundle branch block, first degree AV block, incomplete left bundle branch block, incomplete right bundle branch block, ischemia/infarction, junctional rhythm, left anterior fascicular block, left atrial enlargement, left axis deviation, left posterior fascicular block, left ventricular hypertrophy, low QRS voltage, pacemaker rhythm, prolonged QT interval, right atrial enlargement, right ventricular hypertrophy, second degree AV block, sinus bradycardia, sinus pause, sinus rhythm, sinus tachycardia, supraventricular tachycardia, ventricular premature complex, ventricular tachycardia, and Wolff–Parkinson–White syndrome. There are 31 diagnostic pattern classes and 8 continuous ECG measurements extracted from the quantitative measurements and abnormal findings of the ECG based on the standard phrases in the Philips system described previously (Table S1).³⁰ Data missing from ECG measurements were imputed using multiple imputation methods.³¹ The disease histories were based on the corresponding International Classification of Diseases, Ninth Revision and Tenth Revision (ICD-9 and ICD-10, respectively) described previously.³² The primary outcome of this study was all-cause mortality calculated based on the index date of the ECG. Electronic medical records defined the status (dead/alive) of the patient. These records were updated by hospital staff as needed. Moreover, data for alive visits were censored at the patient's last known hospital alive encounter to limit bias from incomplete records. We also performed a secondary analysis on extensive CVDs, such as CV mortality, non-CV mortality, new-onset acute myocardial infarction (AMI), new-onset stroke (STK), and new-onset heart failure (HF). We defined a new-onset event as a record of the corresponding ICD codes, such as AMI, STK, or HF. Patients meeting any of the above criteria before the index date of the ECG were excluded and defined as having a corresponding disease history.

Implementation of the DLM

The major architecture of the proposed survival DLM is summarized in Figure 2. During recording, each ECG was captured in the standard 12-lead format, resulting in a sequence of 5000 numbers. A 5000 × 12 matrix was then created from these sequences. For the input format of this architecture, a 4096 × 12 matrix was used. During the training process, a length of 4096 sequences was randomly selected and cropped as input. We developed a DLM to conduct survival analysis based on the Cox proportional hazard model. As the baseline benchmark of mortality prediction by ECG, we also trained a DLM via 1-year mortality information using only 153,207 patients with 252,355 ECGs since the surviving patients missing follow-up within 1 year were excluded. In this DLM with binary output, the cases are the patients who died within 1 year, and the controls are the patients who lived more than 1 year.

Figure 2.

The model architecture of the deep survival neural network for ECG analysis. The deep neural network was constructed by a series of residual modules and pool modules, which provided a lead-specific feature map to obtain an integrated prediction via an attention module. The number in the right site is the tensor shape of the output.

The output of the proposed survival DLM was a continuous value of $h (x)$ , which was the output of the last fully connected layer for each ECG. The loss function was also based on the Cox partial likelihood function. The Cox proportional hazard model is a statistical method for time-to-event analysis. The individual event-happened risk of censoring data at a specific time, such as death in medical research, is estimated by the hazard function and baseline hazard in the Cox model. Suppose that data consisting of subjects have been collected. All subjects have observed covariates vector x, including variables such as height, sex, or age on the baseline. The Cox model assumes that the hazard rate for subject m at time t can be estimated by:

λ (t | x_{m}) = λ_{0} (t) \cdot e^{h (x_{m})}

where

λ_{0} (t)

is a baseline hazard function at time t and

h (x_{m})

is a risk function that describes the chance of an event occurring for subject m. The function

h (x_{m})

can be simply estimated by the linear combination of covariates by the expression:

h (x_{m}) = X_{m} B

Furthermore,

X_{m}

is an

1 \times i

vector of observed covariates for Subject m, and B is an

i \times 1

vector of the parameters that can be estimated from partial likelihood for corresponding covariates. These two vectors are defined as:

X_{m} = [\begin{matrix} x_{m, 1} & x_{m, 2} & \dots & x_{m, i} \end{matrix}], B = [\begin{matrix} β_{1} \\ β_{2} \\ \dots \\ β_{i} \end{matrix}]

For each moment that we observed the event, the risk of the individual (or the value of the

h (x)

function) should be large over all the individuals still at risk, including event-happened individuals. Under no consideration of the baseline hazard function, the Cox partial likelihood function of the Cox model is defined as:

L_{c} (β) = \prod_{k : E_{k} = 1} \frac{e^{h (x | k)}}{\sum_{j \in R (t_{k})} e^{h (x | j)}}

where E represents the event status, k indices the subject and

E_{k} = 1

is defined as the subject with failure. The risk set

R (t_{k})

includes all the at-risk individuals

1, 2, 3, 4, \dots, j - 1, j

at failure time

t_{k}

, where

t_{k}

is the failure time for subject k. With the maximum value of each individual with failure

e^{h (x | k)}

on the corresponding observed failure time, the partial likelihood is defined as the product of failure individuals over the sum of the corresponding risk set. The parameter

β

of the risk function

h (x)

can be estimated by the maximum likelihood estimator.^33,34 Additionally, we can optimize the risk function by minimizing the negative log of the equation with gradient descent. Gradient descent is the most famous optimization algorithm for neural networks. The negative log of the partial likelihood function can be expressed as follows, which can be the objective loss function for optimization in neural networks:

loss (β) = - \frac{1}{N_{E = 1}} \sum_{k : E_{k} = 1} (h (x | k) - \log (\sum_{j \in R (t_{k})} e^{h (x | j)}))

According to this loss function, we developed a DLM with a linear output as

h (x)

, which was the output of the last fully connected layer for each ECG.

We trained these DLMs with a batch size of 32 and used an initial learning rate of 0.001 using an Adam optimizer with standard parameters (β₁ = 0.9 and β₂ = 0.999). The learning rate was decayed by a factor of 10 each time the loss of the validation cohort plateaued after an epoch. To prevent the networks from overfitting, early stopping was performed by saving the network after every epoch and choosing the saved DLMs with the lowest loss on the validation cohort. The only regularization method for avoiding overfitting was L2 regularization with a coefficient of 10⁻⁴ in this study.

Statistical analysis

We presented the characteristics of the different sets as the means and standard deviations, numbers of patients, or percentages. The performance of the two DLMs was compared by the receiver operating characteristic (ROC) curve for 1-year mortality analysis, and the area under the curve (AUC) was also presented. To evaluate the additional predictive contribution from ECG risk, we used the concordance index, also called the C-index, to present the global performance. We additionally added sex and age to enhance the DLM performance by multivariable Cox proportional hazard models, and the adjusted hazard ratio (HR) and 95% confidence interval (95% CI) were presented. The statistical analysis was carried out using the software environment R version 3.4.4. We used a significance level of p < 0.05 throughout the analysis.

Results

Table 1 shows the distribution of baseline demographic characteristics, disease histories, and follow-up information in the development set, tuning set, internal validation set, and external validation set. The demographic characteristics and follow-up information in the SaMi-Trop and CODE15 cohorts are also presented.

Table 1.

Baseline characteristics.

	Private Data Sets				Open Data Sets
	Development(n = 451,950)	Tuning(n = 89,302)	Internal Validation(n = 27,808)	External Validation(n = 33,047)	CODE15(n = 233,647)	SaMi-Trop(n = 1631)
Data source
ED	101,982 (24.2%)	42,970 (48.6%)	6512 (31.2%)	14,279 (43.5%)
IPD	182,450 (43.2%)	22,939 (25.9%)	3276 (15.7%)	6115 (18.6%)
OPD	37,974 (9.0%)	11,004 (12.4%)	8319 (39.8%)	8974 (27.4%)
Unknown	99,858 (23.6%)	11,586 (13.1%)	2792 (13.4%)	3421 (10.4%)
Demography
Sex (male)	240,678 (53.3%)	44,904 (50.3%)	14,724 (52.9%)	16,740 (50.7%)	94,736 (40.5%)	534 (32.7%)
Age (years)	57.7 ± 18.3	63.5 ± 17.5	53.1 ± 18.2	57.9 ± 20.6	50.4 ± 19.8	59.4 ± 12.8
BMI (kg/m²)	24.4 ± 4.2	24.3 ± 4.3	24.3 ± 4.1	24.2 ± 4.2
Disease history
DM	92,506 (20.5%)	28,300 (31.7%)	4454 (16.0%)	7797 (23.6%)
HTN	147,675 (32.7%)	45,296 (50.7%)	8684 (31.2%)	12,980 (39.3%)
HLP	129,366 (28.6%)	37,712 (42.2%)	7601 (27.3%)	11,583 (35.1%)
CKD	75,754 (16.8%)	30,717 (34.4%)	3200 (11.5%)	5281 (16.0%)
AMI	12,726 (2.8%)	5464 (6.1%)	422 (1.5%)	472 (1.4%)
STK	44,429 (9.8%)	16,039 (18.0%)	2373 (8.5%)	3946 (11.9%)
CAD	90,851 (20.1%)	30,057 (33.7%)	4781 (17.2%)	6437 (19.5%)
HF	30,555 (6.8%)	13,339 (14.9%)	1557 (5.6%)	2307 (7.0%)
COPD	1544 (6.4%)	305 (16.5%)	207 (4.4%)	56 (12.5%)
Normal ECGs					92,138 (39.4%)	286 (17.5%)
Follow-up information
Follow-up (years), median (IQR)	0.8 (0.1–2.8)	1.8 (0.4–3.3)	1.9 (0.1–4.9)	0.7 (0.1–3.0)	3.5 (2.1–5.2)	2.1 (2.0–2.2)
All-cause mortality, n (%)	23,592 (5.2%)	6642 (7.4%)	817 (2.9%)	1173 (3.5%)	8341 (3.6%)	104 (6.4%)
All-cause mortality, n (%) (1 year)	15,766 (6.5%)	3647 (11.1%)	446 (3.9%)	753 (4.2%)	2812 (24.2%)	46 (93.9%)
All-cause mortality, n (%) (3 years)	20,899 (6.0%)	5562 (8.8%)	692 (4.1%)	1008 (4.1%)	6173 (6.4%)	104 (6.4%)

ED: emergency department; IPD: inpatient department; OPD: outpatient department; BMI: body mass index; DM: diabetes mellitus; HTN: hypertension; HLP: hyperlipidemia; CKD: chronic kidney disease; AMI: acute myocardial infarction; STK: stroke; CAD: coronary artery disease; HF: heart failure; COPD: chronic obstructive pulmonary disease; IQR: interquartile range.

The comparison between our survival DLM and the previous binary DLM on 1-year mortality risk is shown in Figure 3(a). The AUC of ECG risk predicted by survival DLM was 0.894 (95% CI: 0.879–0.909), which was significantly (p = 0.005) higher than the AUC of 0.876 (95% CI: 0.858–0.893) predicted by binary DLM in the internal validation set. A similar result was shown in the external validation. Figure 3(b) shows the long-term mortality predictive performance in different input settings. In both the internal and external validation sets, our survival DLM outperformed the binary DLM with C-indices of 0.858 and 0.836, respectively. Moreover, the survival DLM demonstrated improved performance with demographic characteristics, as evidenced by significantly higher C-indices of 0.874 and 0.856, compared to the C-indices obtained using only sex and age in both validation sets. With the operating points of Figure 3(c), patients were divided into low-risk, low-moderate, high-moderate, and high-risk groups, and the proportion of each group in the internal and external validation sets is shown in Figure 3(d) for subsequent analysis. The inconsistent proportion between the tuning set and validation sets was due to the greater number of ECGs in high-risk patients, and we used only one ECG for each patient in the validation sets. Moreover, in Figure 3(e), it can be observed that the subjects who died within 3 days, 1 month, 6 months, and 1 year were classified into the same death group in this analysis. Therefore, long-term time-to-event data indicate that the ECG-risk model outperforms ECG risk (1 year). Figure 3(f) summarizes the C-index in patients with or without disease histories, and the ECG-risk score performed better in patients without diabetes mellitus (DM), chronic kidney disease (CKD), and HF.

Figure 3.

Summary of model performance for predicting all-cause mortality and the risk stratifications in the internal and external validation sets. (a) ROC curves of the risk prediction based on a deep survival neural network trained by long-term mortality risk (ECG risk) and a deep neural network trained by 1-year mortality risk (ECG risk (1 year)). The cases are the patients who died within 1 year, and the controls are the patients who lived more than 1 year. (b) The C-index for the indicated input data, including (i) age and sex alone, (ii) ECG-risk score based on ECG voltage-time traces alone, (iii) ECG-risk (1 year) score based on ECG voltage-time traces alone, and (iv) ECG-risk score with age and sex. (c) The ECG-risk score is transformed to the percentile scale in risk curve analysis, and we selected an HR of 1 as the operating point to distinguish the low-risk group and intermediate-low-risk group in the tuning set, followed by HRs of 2 and 4 to further stratify the intermediate-high-risk group and high-risk group. (d) According to the operating points decided previously, the patients in the internal and external validation sets were classified into low-risk, intermediate-low-risk, intermediate-high-risk, and high-risk groups for subsequent analyses. (e) Comparing the difference in the AUC between ECG risk and ECG risk (1 year) with varying lengths of follow-up time. (f) Stratified analysis for the C-index of ECG risk on long-term all-cause mortality. *p < 0.05; **p < 0.01; ***p < 0.001.

Figure 4 compares the mortality of patients in the low-risk, low-moderate, high-moderate, and high-risk groups. The incidences of all-cause mortality were 28.0% at 3 years and 38.9% at 9 years in the high-risk group in the internal validation set, which were significantly higher than the incidences of all-cause mortality in the low-risk group (1.1% and 2.7%) with an adjusted HR of 14.16 (95% CI: 11.33–17.70). The obvious dose–effect relationship was also presented from the HRs of the intermediate-low-risk (3.15, 95% CI: 2.55–3.89) and intermediate-high-risk groups (6.81, 95% CI: 5.49–8.45) to the HR of the high-risk group. Additionally, we conducted a mortality risk analysis on ECGs from different departments in various risk stratifications. Supplemental Figure S1 depicts a comparable trend for patients who visited different departments. This trend was also presented in the external validation set. Interestingly, this risk difference was not only present in CV mortality but also presented in non-CV mortality. The high-risk group presented HRs of 13.68/9.97 compared to the low-risk group on non-CV mortality in internal/external validations, which revealed the non-CV-related information extracted by ECG by our survival DLM. Figure 5 shows a similar trend for new-onset AMI, new-onset STK, and new-onset HF. The HRs of the intermediate-low-risk group, intermediate-high-risk group, and high-risk group were 1.93 (95% CI: 1.26–2.96), 2.37 (95% CI: 1.57–3.58), and 6.66 (95% CI: 4.54–9.77), respectively, for new-onset HF in internal validation, especially in the risk increase of the first 3 years from 0.4% to 1.2%/1.7%/7.6%. A higher risk of new-onset AMI was also identified in the high-risk group (1.8% at 3 years) than in the low-risk group (0.8% at 3 years), which led to an HR of 4.01 (95% CI: 2.24–7.17) in the internal validation. The risk stratification was also presented on new-onset STK. Similar analyses in the external validation set showed consistent results, which revealed the application value of the ECG-risk score on extensive clinical outcomes.

Figure 4.

Kaplan–Meier curves for each risk stratification on all-cause mortality, cardiovascular (CV) mortality, and non-CV mortality. The analyses are conducted both in internal and external validation sets. The table shows the at-risk population and cumulative risk for the given time intervals in each risk stratification.

Figure 5.

Kaplan–Meier curves for each risk stratification on new-onset AMI, STK, and HF. The analyses are conducted both in internal and external validation sets. Patients with a history of the corresponding disease were excluded from the analyses. The table shows the at-risk population and cumulative risk for the given time intervals in each risk stratification.

Figure 6(a) shows the distributions of ECG morphology in each risk group. The high-risk group identified by the proposed survival DLM presented less sinus rhythm, more atrial fibrillation, more ventricular premature complex, higher heart rate, longer QT interval, and abnormal T wave axis than the intermediate-high group, followed by the intermediate-low group and low-risk group. The above abnormal morphologies were associated with all-cause mortality, CV mortality, non-CV mortality, new-onset AMI, new-onset STK, and new-onset HF in the internal and external validation sets (Figure 6(b)). These results revealed some of the reasons why our survival ECG-risk score was associated with extensive outcomes.

Figure 6.

ECG morphology analysis of risk stratifications on adverse outcomes. (a) Distribution of ECG morphology in each risk stratification. Bars represent the mean or prevalence where appropriate and corresponding 95% confidence intervals (*p < 0.05; **p < 0.01; ***p < 0.001). (b) Risk analysis of selected ECG morphologies on adverse outcomes. Red, gray, and blue bars denote significantly positive, nonsignificant, and negative associations, respectively, with the corresponding outcomes.

Figure 7 shows the performance of our ECG risk estimated by survival DLM in two transnational cohorts, CODE15 and SaMi-Trop, and compared to the previous indicator of mortality, ECG age. During median follow-up years of 3.5 (interquartile range (IQR): 2.1–5.2) and 2.1 (IQR: 2.0–2.2), the initial at-risk patients in CODE15 and SaMi-Trop were 233,647 and 1631, respectively. Figure 7(a) shows the comparison between ECG age and ECG risk. There was no significant difference in the C-index between ECG age and ECG risk in SaMi-Trop, and ECG age performed significantly better in CODE15. However, ECG risk with age and sex provided the highest C-index in both transnational cohorts, which was significantly better than the integration of sex, age, and ECG age. We used the same cut-points as our data sets to divide transnational patients into low-risk, low-moderate, high-moderate, and high-risk groups, and the proportion of each group in the internal and external validation sets is shown in Figure 7(b). The proportion of the higher-risk group in abnormal ECGs was higher than the proportion in normal ECGs, which demonstrated the relationship between ECG risk and physician opinion. However, there were still more than 6% high-risk patients in the normal group and more than 26% low-risk patients in the abnormal group. Figure 7(c) shows the proportion of each risk group at different ages. Due to the different age distributions, the overall proportion of each risk group was different in the four data sets. However, the proportion of the higher-risk group increased consistently by age in the four data sets. Figure 7(d) shows the significant risk stratification performance of our ECG risk in SaMi-Trop and CODE15, which had HRs of 4.91 (95% CI: 2.63–9.16) and 2.29 (95% CI: 2.15–2.44) in the high-risk group compared to the low-risk group for all-cause mortality, respectively.

Figure 7.

Summary of model performance in the SaMi-Trop and CODE15 data sets. (a) The C-index for the indicated input data, including: (i) ECG age provided by a previous study, (ii) ECG-risk score based on our survival model, (iii) age and sex alone, (iv) ECG age with age and sex, and (v) ECG-risk score with age and sex. “Normal” refers to the ECGs labeled normal by the original interpreting physician at the time of ECG acquisition, and “abnormal” refers to any ECGs not identified as normal. (b) The proportion of low-risk, intermediate-low-risk, intermediate-high-risk, and high-risk groups in the SaMi-Trop and CODE15 sets. (c) The distribution of age and proportion of risk groups in the internal validation, external validation, SaMi-Trop, and CODE15 data sets. (d) Kaplan–Meier curves for each risk stratification on all-cause mortality in the SaMi-Trop and CODE15 data sets. The table shows the at-risk population and cumulative risk for the given time intervals in each risk stratification.

We further compared the mortality prediction performance between our ECG-risk score and the gap in ECG age and chronological age in SaMi-Trop and CODE15 stratified by normal and abnormal ECGs (Figure 8). After stratification by our ECG-risk score, the mortality rates were 6.5% at 1 year and 18.9% at 3 years in the high-risk group in the abnormal ECGs of SaMi-Trop, which were significantly higher than the mortality rates in the low-risk group (2.0% and 3.1%) with a crude HR of 5.91 (95% CI: 3.06–11.39). However, the gap in ECG age and chronological age had insignificant crude HRs, which demonstrated the limitation of ECG age. Although significant HRs were shown after adjusting sex and age for the gap in ECG age and chronological age, our ECG-risk score still presented better risk stratification performance, which was consistent in all transnational subgroup analyses. Moreover, our ECG-risk score simultaneously performed risk stratification in both normal and abnormal ECGs.

Figure 8.

The mortality prediction performance comparison between our ECG-risk score and the gap in ECG age and chronological age in the SaMi-Trop and CODE15 data sets. All analyses were stratified as “normal” and “abnormal” ECGs. “Normal” refers to the ECGs labeled normal by the original interpreting physician at the time of ECG acquisition, and “abnormal” refers to any ECGs not identified as normal. The HRs summarize the Cox regression models adjusted by different selections of variables. The table shows the at-risk population and cumulative risk for the given time intervals in each risk stratification. We did not perform this analysis in the normal ECGs of SaMi-Trop because most patients had ECG abnormalities related to Chagas disease.

Discussion

This study revealed the superiority of the proposed survival DLM compared to the traditional binary analysis on mortality analysis. With the powerful feature extraction ability via DLM, the proposed ECG-risk score trained by mortality was associated with extensive clinical outcomes, which was also associated with traditional ECG morphologies. Compared to the previous integrating indicator, the gap in ECG age, and chronological age, our indicator was easier to explain and provided additional information in addition to demographic characteristics. The proposed ECG-risk score may be extensively applied in primary care based on EHR-based CV screening to manage asymptomatic and unaware high-risk people.

A previous study established a DLM using ECG voltage-time traces alone to predict 1-year mortality with an AUC of 0.855, but this model excluded surviving patients followed up within a year and classified all deaths within different time periods into the same group.²³ To address these limitations, we developed a survival DLM that performs better than the previous binary DLM, even in predicting 1-year mortality. The survival DLM utilizes all ECGs in the study and accounts for the severity of events by considering the time-to-event data. The proposed model may become the standard for predicting long-term time-to-event data as the Cox proportional hazard model outperforms logistic regression in this task.³⁵ The survival DLM has potential for extensive future event analysis beyond ECG and all-cause mortality, providing better performance.

The ECG waveform consists mainly of P, Q, R, S, and T waves combined into diverse morphologies to describe electrical activity, heart architecture, and heart–lung–torso geometry.³⁶ A previous study developed a morphology combination score (MCS) based on asymmetric, flattened, and/or notched T waves and a high MCS presented adjusted HRs of 1.75 (95% CI: 1.62–1.89) and 1.61 (95% CI: 1.43–1.92) for mortality in women and men, respectively.³⁷ The highest quartile of QRS duration was also associated with higher CV mortality than the lowest quartile (HR: 1.3, 95% CI 1.01–1.7).³⁸ Our DLM integrated whole ECG information, including but not limited to atrial fibrillation, ventricular premature complex, tachycardia, prolonged QT interval, and abnormal T wave axis, to establish mortality risk and performed better due to its powerful feature extraction capability. Although physicians have already used ECG to recognize abnormal signals to decide on further intervention,³⁹ our DLM can provide further risk stratification in normal/abnormal patients, which is consistent with previous studies.²³ Moreover, since the above abnormal morphologies were widely associated with CVDs,⁴⁰ the ECG mortality risk integrating these signals was also associated with future CVDs, which may be used as an integrating indicator for CVD risk stratification.

Stratifying individuals by CVD risk scores can help high-risk groups reduce CVD risk factors and receive preventive medication.⁴¹ Therefore, estimating an individual's CVD risk is crucial for primary prevention,⁴² and using a single examination, such as ECG, may be superior to risk factor combinations in passive electronic-health-record-based screening.¹⁵ ECG age has been proposed as an integrating indicator associated with extensive CVD outcomes,^21,22 particularly in mortality prediction.^21,22,43,44 However, ECG age estimation is limited due to underestimation in elderly individuals and overestimation in young people. In contrast, the proposed ECG mortality risk was easier to interpret and provided better performance in mortality prediction, even after considering age and sex, due to duplicated information between ECG age and chronological age. Additionally, the ECG mortality risk was associated with extensive CVDs in our study and may become the basis for CVD control in future passive screening. The proposed ECG mortality risk score may help asymptomatic and unaware CVD patients in risk stratification.

Many guidelines for primary prevention recommend the estimated risk of CVD as the best guide for intervention decisions.^45–49 Currently, British,⁵⁰ European,⁴⁹ American,⁴⁷ and New Zealand⁵¹ use 3- to 9-year interval screening protocols to identify high-risk groups for CVD. However, only a few people actively receive the current screening protocols. ECG is an inexpensive and widely available examination that can become a popular test in routine clinical practice.⁵² Passive CVD risk estimation by a single ECG may identify many asymptomatic and unaware CVD patients,⁵³ and early detection can improve physicians’ clinical insight to reduce the risk of future events.⁵² Passive electronic-health-record-based screening may have almost no cost in hospitals and insurance institutions.⁵⁴ Therefore, the new generation of CVD screening systems may combine passive ECG-risk detection and existing active CVD examination to provide a more comprehensive risk assessment.

There are several limitations to this study that need to be acknowledged. First, the DLM was only established and validated retrospectively, and its actual clinical impact has not been evaluated in practice. Future studies should embed the ECG-risk score in a passive EHR-based notification system and evaluate its clinical acceptability and effects of intervention in high-risk patients. Second, the black box limitation of the DLM limits its transparency, which may hinder clinicians from designing suitable interventions for high-risk patients. Third, the ECG-risk score performs poorly in people with a history of diseases, although the major benefit of this system may be to passively detect asymptomatic and unaware CVD patients. Finally, the performance was decreased in transnational cohorts, but we released our survival DLM and recommended that researchers establish their own DLM for future applications at each site.

Conclusions

Our study demonstrated that ECG mortality risk was extensively correlated with CVD-related events, which may be better than ECG age as an integrating indicator for passive EHR-based screening. Survival DLM may maximally extract time-to-event information to increase accuracy, which may be applied in future survival analyses. With the mature medical information system and the simple 12-lead ECG test, our ECG risk may be applied in a passive EHR-based CVD risk screening program. Asymptomatic and unaware CVD patients may benefit from this screening in hospitals and the community and can receive immediate intervention to prevent future progression.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076231187247 - Supplemental material for Mortality risk prediction of the electrocardiogram as an informative indicator of cardiovascular diseases

Supplemental material, sj-docx-1-dhj-10.1177_20552076231187247 for Mortality risk prediction of the electrocardiogram as an informative indicator of cardiovascular diseases by Dung-Jang Tsai, Yu-Sheng Lou, Chin-Sheng Lin, Wen-Hui Fang, Chia-Cheng Lee, Ching-Liang Ho, Chih-Hung Wang and Chin Lin in DIGITAL HEALTH

Supplemental Material

sj-docx-2-dhj-10.1177_20552076231187247 - Supplemental material for Mortality risk prediction of the electrocardiogram as an informative indicator of cardiovascular diseases

Supplemental material, sj-docx-2-dhj-10.1177_20552076231187247 for Mortality risk prediction of the electrocardiogram as an informative indicator of cardiovascular diseases by Dung-Jang Tsai, Yu-Sheng Lou, Chin-Sheng Lin, Wen-Hui Fang, Chia-Cheng Lee, Ching-Liang Ho, Chih-Hung Wang and Chin Lin in DIGITAL HEALTH

Footnotes

Availability of Data and Materials

The private database in this study is not publicly available due to privacy and security concerns. The data may be shared with a third party upon execution of the data sharing agreement for reasonable requests, such requests should be addressed to the corresponding author CL. The SaMi-Trop cohort was made openly available (https://doi.org/10.5281/zenodo.4905618). The CODE15 cohort was also made openly available ().

Contributorship

All authors participated in designing the study, generating hypotheses, interpreting the data, and critically reviewing the paper. DJT and CL wrote the first draft, and CSL, CLH, and CHW contributed substantially to the writing of subsequent versions. Statistical analyses were designed and conducted by DJT with support from CL, YSL, and WHF. All authors had full access to all the data in the study and accept responsibility for the decision to submit for publication. DJT, CL, and CCL verified all the data used in this study. The corresponding author (CL) attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. CL had final responsibility for the decision to submit for publication.

Code Availability

The code for the survival DLM training is available at the GitHub repository .

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by funding from the Ministry of Science and Technology, Taiwan (grant numbers MOST110-2314-B-016-010-MY3 to CL and MOST111-2321-B-016-003 to CHW), the Tri-Service General Hospital, Taiwan (grant numbers TSGH-B-111020 to CLH), and the Cheng Hsin General Hospital, Taiwan (grant number CHNDMC-111–07 to CL).

Guarantor

Chin Lin.

Informed Consent

This study was approved by the Institutional Review Board of Tri-Service General Hospital, Taipei, Taiwan (IRB No. C202105049). Since we retrospectively used de-identified data collected and encrypted from the hospital to the data controller, an informed consent waiver was granted for this study.

ORCID iDs

Wen-Hui Fang

Chin Lin

Supplemental Material

Supplemental material for this article is available online.

References

Abubakar

Tillmann

Banerjee

. Global, regional, and national age-sex specific all-cause and cause-specific mortality for 240 causes of death, 1990-2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet 2015; 385: 117–171.

McGill Jr

McMahan

Gidding

. Preventing heart disease in the 21st century: implications of the pathobiological determinants of atherosclerosis in youth (PDAY) study. Circulation 2008; 117: 1216–1227.

D’Agostino Sr

Vasan

Pencina

, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 2008; 117: 743–753.

Lloyd-Jones

Wilson

PWF

Larson

, et al. Framingham risk score and prediction of lifetime risk for coronary heart disease. Am J Cardiol 2004; 94: 20–24.

Hippisley-Cox

Coupland

Vinogradova

, et al. Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. Br Med J 2007; 335: 136.

Conroy

. Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. Eur Heart J 2003; 24: 987–1003.

Woodward

Brindle

Tunstall-Pedoe

. Adding social deprivation and family history to cardiovascular risk assessment: the ASSIGN score from the Scottish Heart Health Extended Cohort (SHHEC). Heart 2005; 93: 172–176.

Assmann

Cullen

Schulte

. Simple scoring scheme for calculating the risk of acute coronary events based on the 10-year follow-up of the prospective cardiovascular Munster (PROCAM) study. Circulation 2002; 105: 310–315.

Ferrario

Chiodini

Chambless

, et al. Prediction of coronary events in a low incidence population. Assessing accuracy of the CUORE cohort study prediction equation. Int J Epidemiol 2005; 34: 413–421.

10.

Wells

Riddell

Kerr

, et al. Cohort profile: the PREDICT cardiovascular disease cohort in New Zealand primary care (PREDICT-CVD 19). Int J Epidemiol 2017; 46: 22–22.

11.

Wilson

D’Agostino

Levy

, et al. Prediction of coronary heart disease using risk factor categories. Circulation 1998; 97: 1837–1847.

12.

Patel

Volpp

Asch

. Nudge units to improve the delivery of health care. N Engl J Med 2018; 378: 214–216.

13.

Patel

Volpp

. Leveraging insights from behavioral economics to increase the value of health-care service provision. J Gen Intern Med 2012; 27: 1544–1547.

14.

Goff Jr

Lloyd-Jones

Bennett

, et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation 2014; 129: S49–S73.

15.

Hira

Kennedy

Nambi

, et al. Frequency and practice-level variation in inappropriate aspirin use for the primary prevention of cardiovascular disease: insights from the National Cardiovascular Disease Registry's Practice Innovation and Clinical Excellence registry. J Am Coll Cardiol 2015; 65: 111–121.

16.

Çallı

Sogancioglu

van Ginneken

, et al. Deep learning for chest X-ray analysis: a survey. Med Image Anal 2021; 72: 102125.

17.

Shekar

Satpute

Gupta

. Review on diabetic retinopathy with deep learning methods. Journal of Medical Imaging 2021; 8: 060901.

18.

Somani

Russak

Richter

, et al. Deep learning and the electrocardiogram: review of the current state-of-the-art. EP Europace 2021; 23: 1179–1191.

19.

Siontis

Noseworthy

Attia

, et al. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management. Nat Rev Cardiol 2021; 18: 465–478.

20.

Attia

Harmon

Behr

, et al. Application of artificial intelligence to the electrocardiogram. Eur Heart J 2021; 42: 4717–4730.

21.

Chang

C-H

Lin

C-S

Luo

Y-S

, et al. Electrocardiogram-Based Heart Age Estimation by a Deep Learning Model Provides More Information on the Incidence of Cardiovascular Disorders. Front Cardiovasc Med 2022; 9: 754909.

22.

Lima

Ribeiro

Paixão

GMM

, et al. Deep neural network-estimated electrocardiographic age as a mortality predictor. Nat Commun 2021; 12: 5117.

23.

Raghunath

Ulloa Cerna

Jing

, et al. Prediction of mortality from 12-lead electrocardiogram voltage data using a deep neural network. Nat Med 2020; 26: 886–891.

24.

Di Lorenzo Oliveira

Nunes

MCP

Colosimo

, et al. Risk score for predicting 2-year mortality in patients with Chagas cardiomyopathy from endemic areas: SaMi-Trop cohort study. J Am Heart Assoc 2020; 9: e014176.

25.

Cardoso

Sabino

Oliveira

CDL

, et al. Longitudinal study of patients with chronic Chagas cardiomyopathy in Brazil (SaMi-Trop project): a cohort profile. BMJ open 2016; 6: e011181.

26.

Ribeiro

Paixão

GMM

Gomes

, et al. Tele-electrocardiography and bigdata: the CODE (Clinical Outcomes in Digital Electrocardiography) study. J Electrocardiol 2019; 57: S75–S78.

27.

Ribeiro

Alkmim

Cardoso

, et al. Implementation of a telecardiology system in the state of Minas Gerais: the Minas Telecardio project. Arq Bras Cardiol 2010; 95: 70–78.

28.

Alkmim

Minelli Figueira

Soriano Marcolino

, et al. Improving patient access to specialized health care: the Telehealth Network of Minas Gerais, Brazil. Bull W H O 2012; 90: 373–378.

29.

Mason

Hancock

Gettes

. Recommendations for the standardization and interpretation of the electrocardiogram: Part II: Electrocardiography diagnostic statement list: A scientific statement from the American Heart Association Electrocardiography and Arrhythmias Committee, Council on Clinical Cardiology; the American College of Cardiology Foundation; and the Heart Rhythm Society: Endorsed by the International Society for Computerized Electrocardiology. Circulation 2007; 115: 1325–1332.

30.

Lin

C-S

Lee

Y-T

Fang

W-H

, et al. Deep learning algorithm for management of diabetes mellitus via electrocardiogram-based glycated hemoglobin (ECG-HbA1c): a retrospective cohort study. J Pers Med 2021; 11: 725.

31.

Van Buuren

Groothuis-Oudshoorn

. Mice: multivariate imputation by chained equations in R. J Stat Softw 2011; 45: 1–67.

32.

Chang

Lin

Luo

, et al. Electrocardiogram-based heart age estimation by a deep learning model provides more information on the incidence of cardiovascular disorders. Front Cardiovasc Med 2022; 9: 754909.

33.

Qin

Ning

Liu

, et al. Maximum likelihood estimations and EM algorithms with length-biased data. J Am Stat Assoc 2011; 106: 1434–1449.

34.

Efron

. The efficiency of Cox’s likelihood function for censored data. J Am Stat Assoc 1977; 72: 557–565.

35.

Longato

Vettoretti

Di Camillo

. A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. J Biomed Inform 2020; 108: 103496.

36.

Hoekema

Uijen

van Oosterom

. Geometrical aspects of the interindividual variability of multilead ECG recordings. IEEE Trans Biomed Eng 2001; 48: 551–559.

37.

Isaksen

Ghouse

Graff

, et al. Electrocardiographic T-wave morphology and risk of mortality. Int J Cardiol 2021; 328: 199–205.

38.

Badheka

Singh

Patel

, et al. QRS Duration on electrocardiography and cardiovascular mortality (from the National Health and Nutrition Examination Survey—III). Am J Cardiol 2013; 112: 671–677.

39.

Rao

Manikanta

, et al. Distinguishing normal and abnormal ECG signal. Indian Journal of Science and Technology 2016; 9: 1–5.

40.

De Luna

Batchvarov

Malik

. The morphology of the electrocardiogram. The ESC Textbook of Cardiovascular Medicine Blackwell Publishing 2006; 35.

41.

Karmali

Persell

Perel

, et al. Risk scoring for the primary prevention of cardiovascular disease. Cochrane Database Syst Rev 2017: 3: CD006887.

42.

Piepoli

Hoes

Agewall

, et al. 2016 European guidelines on cardiovascular disease prevention in clinical practice. Kardiologia Polska (Pol Heart J) 2016; 74: 821–936.

43.

Ladejobi

Medina-Inojosa

Shelly Cohen

, et al. The 12-lead electrocardiogram as a biomarker of biological age. Eur Heart J - Digit Health 2021; 2: 379–389.

44.

Hirota

Suzuki

Arita

, et al. Prediction of biological age and all-cause mortality by 12-lead electrocardiogram in patients without structural heart disease. BMC Geriatr 2021; 21: 1–8.

45.

World Health Organization. Prevention of cardiovascular disease: guidelines for assessment and management of total cardiovascular risk. World Health Organization, 2007.

46.

Board

JBS

. Joint British Societies’ consensus recommendations for the prevention of cardiovascular disease (JBS3). Heart 2014; 100: ii1–ii67.

47.

Goff Jr

Robinson

Lichtenstein

, et al. American Heart Association Task Force on Practice G. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association task force on practice guidelines. Circulation 2014; 129: S49–S73.

48.

Khanji

Bicalho

VVS

van Waardhuizen

, et al. Cardiovascular risk assessment: a systematic review of guidelines. Ann Intern Med 2016; 165: 713–722.

49.

Piepoli

Hoes

Agewall

, et al. Guidelines: Editor's choice: 2016 European guidelines on cardiovascular disease prevention in clinical practice: The Sixth Joint Task Force of the European Society of Cardiology and Other Societies on Cardiovascular Disease Prevention in Clinical Practice (constituted by representatives of 10 societies and by invited experts) developed with the special contribution of the European Association for Cardiovascular Prevention & Rehabilitation (EACPR). Eur Heart J 2016; 37: 2315.

50.

Lindbohm

Sipilä

Mars

, et al. 5-year versus risk-category-specific screening intervals for cardiovascular disease prevention: a cohort study. The Lancet Public Health 2019; 4: e189–e199.

51.

Crooke

. New Zealand Cardiovascular guidelines: Best practice evidence-based guideline: The assessment and management of cardiovascular risk December 2003. Clin Biochem Rev 2007; 28: 19–29.

52.

National Heart, Lung, and Blood Institute. Assessing cardiovascular risk: systematic evidence review from the risk assessment work group. 2013.

53.

Bhatia

Bouck

Ivers

, et al. Electrocardiograms in low-risk patients undergoing an annual health examination. JAMA Intern Med 2017; 177: 1326–1333.

54.

Mozaffarian

Benjamin

, et al. Heart disease and stroke statistics—2015 update: a report from the American Heart Association. Circulation 2015; 131: e29–e322.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB

0.50 MB