Sage Journals: Discover world-class research

Abstract

Background

Paravalvular leakage (PVL) and conduction disturbances (CDs) are important complications after transcatheter aortic valve replacement (TAVR). While existing risk prediction models predominantly adopt single-complication modeling strategies, overlooking the interrelatedness.

Objectives

We aimed to develop a multi-label prediction model based on deep learning to predict immediate PVL and new-onset CDs post-TAVR simultaneously.

Methods

The study retrospectively included 966 patients who underwent first-time TAVR for aortic stenosis between April 2012 and July 2023 from the Sichuan University TAVR Registry. A deep learning-based model using the optimization algorithm Muex with 79 features and neural network labels for PVL and new-onset CDs immediately after TAVR was developed. The Muex model was validated using the bootstrap method, evaluated by area under the receiver operating characteristic curve (AUROC) and calibration curves, interpreted with Shapley Additive Explanations, and subsequently compared with a neural network model and two traditional multi-label classification models.

Results

The dataset included 771 training and 195 testing patients, with 6.63% exhibiting more than mild PVL and 39.6% developing new-onset CDs. The Muex model outperformed the neural network, label powerests, and multi-label k-nearest neighbor in both discrimination (micro-average AUROC: 0.739 vs. 0.705 vs. 0.504 vs. 0.514) and calibration (integrated calibration index [ICI]: 0.012 vs. 0.116 vs. 0.046 vs. 0.051), demonstrating strong performance in predicting both complications simultaneously.

Conclusion

The study demonstrated that the Muex model is feasible for simultaneously predicting PVL and CDs post-TAVR, excelling in both performance and interpretability, while identifying high-risk patients and inferring patient-specific risk factors to facilitate informed clinical decision-making.

Trial registration

ClinicalTrials.gov, NCT04415047.

Graphical abstract

Overview of study design and the architecture of Muex model.

Keywords

Multi-label model deep learning transcatheter aortic valve replacement conduction disturbance paravalvular leakage

Introduction

Transcatheter aortic valve replacement (TAVR) is now widely recognized as a viable treatment for patients with severe aortic stenosis who are at intermediate to high or prohibitive surgical risk.^1,2 However, paravalvular leakage (PVL) and conduction disturbances (CDs) (i.e. high-degree atrioventricular block (HAVB) requiring permanent pacemaker implantation (PPMI) and new-onset left bundle-branch block (LBBB)) are two main complications post-TAVR and are associated with an increased risk of late mortality and rehospitalizations.^3–9 PVL is common post-TAVR, with mild cases occurring in up to 40% and moderate or greater in up to 10% in contemporary studies.⁴ CDs occur in 31%–45% of patients post-TAVR, depending on the valve type.¹⁰ New-onset LBBB occurs in 85%–94% of cases during the periprocedural period.¹¹ Therefore, accurate prediction of PVL and CDs following TAVR is essential for optimizing both therapeutic strategies and prognostic assessment.

A number of previous studies have shown that PVL and CDs post-TAVR are associated with clinical risk factors.^{3,5,10,12–23} PVL results from incomplete circumferential apposition of the prosthesis with the annulus. And new-onset CDs are associated with the anatomical proximity of the conduction system to the aortic root and left ventricular outflow tract (LVOT). Due to the overlapping yet inconsistent and intrinsic associations between preprocedural patient characteristics and procedural predictors, these two major complications involve distinct and even conflicting pathophysiological and anatomical mechanisms. As a result, their clinical occurrences often follow a complex, seesaw-like pattern, making it historically challenging to develop a model that can simultaneously predict both.

The rapid development of machine learning (ML), especially artificial intelligence (AI) technology, has promoted the innovation of medical tasks. Neural networks, with advantages in multimodal data processing and multi-node output characteristics, provide a feasible solution for high-dimensional multi-source data processing and multi-label disease prediction.^24–26 ML models have garnered remarkable results in predicting complications post-TAVR, like in-hospital mortality (area under the receiver operating characteristic curve [AUROC]: 0.89–0.95).²⁷ Nonetheless, existing research predominantly focuses on identifying risk factors for single complications, with limited investigation into the interplay between different complications using a unified model. Unified modeling requires integrating clinical, laboratory, imaging, and procedural indicators while employing algorithms that address complex inter-complication relationships. The lack of systematic data collection and methodological support presents significant technical limitations, impeding progress and hindering the clinical translation of comprehensive complication management.

The main objective of this study is to develop a multi-label deep neural network (DNN) model to simultaneously predict immediate PVL and new-onset CDs post-TAVR using multi-source data. Specifically, we proposed an innovative loss function to empower the predictive DNN model to better predict the incidence of both complications, outperforming the basic neural network model and two traditional multi-label classification (MLC) models. We assume that this complication-specific risk assessment and individualized interpretation will provide an intuitive and thorough understanding of PVL and CDs risk post-TAVR, which is anticipated to be beneficial in decision-making in the therapeutic scenarios.

Methods

Data collection and preparation:

The study population included patients who underwent first-time TAVR for aortic stenosis between April 2012 and July 2023, retrospectively collected from the West China Hospital of Sichuan University TAVR Registry. This registry was designed to sequentially enroll all aortic stenosis patients undergoing TAVR at West China Hospital. This trial is registered with ClinicalTrials.gov, NCT04415047, and is ongoing. The study was conducted in accordance with the Declaration of Helsinki, approved by the Ethics Committee on Biomedical Research of West China Hospital, and conformed to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD + AI) checklist (Multimedia Appendix-Table S1).²⁸

All patients underwent a thorough baseline clinical evaluation based on guidelines for the management of valvular heart disease,²⁹ including a comprehensive assessment of cardiovascular risk factors and the calculation of risk scores, specifically the Society of Thoracic Surgeons (STS) score and New York Heart Association (NYHA) classification. Eligibility for TAVR was evaluated in all patients by the multidisciplinary heart team. The valves were selected based on availability at the time of treatment. Data were meticulously collected using case record forms (CRFs) and an electronic data capture system, with a dedicated team responsible for data validation. All preprocedural variables were evaluated within one week before the procedure. All clinical data, as well as follow-up outcomes of patients, are integrated and managed via the self-developed Valvular Heart Disease Intelligent Management Platform of our team, which underpins the long-term comprehensive management of these patients.

The primary endpoints of interest were defined as more than mild PVL and new-onset CDs (including LBBB or second- and third-degree atrioventricular block [AVB]), occurring immediately post-TAVR. Both endpoints were classified according to the Valve Academic Research Consortium-2 criteria. The adjudication of PVL was conducted by the multidisciplinary heart team using transesophageal echocardiography (TEE), while CDs were assessed through an immediate 12-lead electrocardiogram (ECG). Exclusion criteria include (1) patients undergoing non-first-time TAVR; (2) patients with isolated aortic regurgitation; (3) patients with a preoperative pacemaker or with AVB-II, AVB-III, or LBBB; (4) patients with more than 20% of missingness. Then we randomly divided the patients into training and testing sets according to an 8:2 ratio (Figure 1).

Figure 1.

Flowchart of study population. TAVR: transcatheter aortic valve replacement; AVB: atrioventricular block; LBBB: left bundle branch block.

Feature coding

Variables with more than 15% of missingness were excluded from the analysis. Categorical and ordinal variables were assigned discrete values, while continuous variables retained original values. We used different imputation methods for different types of missing values, striving to preserve the integrity of the original data while ensuring that the dataset was formatted correctly for successful input into the model. Specifically, we set missing values of continuous attributes to −1. For categorical attributes, we used zero-fill methods and started encoding from 1. Following data coding and imputation, each patient's characteristics were represented as a multidimensional vector. Using XGBoost, 79 features were selected from 144 candidate variables for prediction model construction, with features exhibiting a feature importance score of 0 (calculated via the gain method) excluded. To address the significant data imbalance, we employed the Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors (SMOTE-ENN), with k set to 3. This hybrid resampling approach integrates SMOTE to generate synthetic samples for the minority class and ENN to remove noise from the majority class, thereby enhancing the model's predictive robustness. Increasingly recognized in clinical informatics, this technique served a dual purpose in our study: first, mitigating the primary imbalance between the occurrence and non-occurrence of complications; and second, harmonizing the disparate incidence rates across specific complication types to ensure unbiased model performance.³⁰ We utilized 79 features to analyze the performance of the Muex DNN Model from the classification experiment. There are 12 demographic and clinical features (height, STS score, creatinine, Syncope, etc.), three preprocedural ECG features (atrial fibrillation and AVB), 20 preprocedural echocardiographic features (mitral regurgitation [MR], etc.), 33 preprocedural CT features (lobe type, etc.), one procedural feature (coronary protection), and 10 comorbidities and historical medical features (hypertension, chronic obstructive pulmonary disease [COPD], etc.).

Deep learning model

In this study, a deep learning model, namely Muex Model, was proposed to simultaneously estimate the prediction of the risk of PVL and new-onset CDs immediately post-TAVR. The Muex model is a feed-forward neural network architecture with multiple layers, including an input layer, two fully connected hidden layers, and an output layer. Both the input and hidden layers are followed by a ReLU activation function, which introduces nonlinearity to the network. The output layer consists of two nodes, each responsible for estimating the risk index of PVL and CDs. The network also incorporates a batch-normalization layer and a Tanh activation function in the activation layers to enhance performance.

We proposed an optimized binary cross-entropy loss (BCE Loss) function named Muex Loss to train our model, which acts as a performance metric and guides the learning process. BCE Loss refines models, improving both overall and sample-wise classification in multi-label tasks.³¹

We optimized BCE Loss by adapting a new sophisticated adaptive weighting scheme. For a single positive label, we applied a “penalty weight,” and for both labels positive, a double penalty, reducing the likelihood of both being positive simultaneously. By applying this weight strategy to BCE Loss based on label comparisons, we improve feature discrimination and expand the classification margin, enhancing the model's ability to capture data patterns. The optimized BCE Loss can be expressed as follows:

loss (x, y) = α {- \frac{1}{2} \sum_{i = 1}^{2} [y_{i} \log x_{i} + (1 - y_{i}) \log (1 - x_{i})]}

(1)

α = y_{1} + y_{2} + 1

(2)

where equation (1) represents Muex Loss, x_i and y_i are the feature and corresponding labels of the data, respectively, α is the “mutual exclusion coefficient,” the calculation method of α is shown in equation (2), y₁ and y₂ are label values of two mutually exclusive events, respectively (Multimedia Appendix-Methods).

Model validation and evaluation

We trained three additional models: a neural network model and two MLC models (label powerests [LP] and multi-label k-nearest neighbor [MLKNN]) to conduct a comparative study, aiming to explore the effectiveness of DNN in prediction and the value of the model in uncovering correlations between outcomes.

All models underwent validation using 1000 times bootstrapping.³² Model evaluation was conducted using the following metrics: discriminative performance was assessed with receiver operating characteristic (ROC) curves; sensitivity and specificity were determined based on the cutoff point on the ROC curve; calibration was assessed using calibration curves. Additionally, we used SHapley Additive exPlanations (SHAP) to enhance model transparency and highlight feature importance.

Statistics and analysis

Continuous variables are expressed as mean ± standard deviation (SD), and categorical variables are reported as counts and percentages. T-tests and chi-squared tests were used to evaluate differences between the groups for continuous and categorical variables, respectively. The features were standardized utilizing the Z-score normalization method. ROC curve was drawn based on the true- and false-positive rates. The contribution ratio (CR), calculated as the SHAP value of a feature relative to the average SHAP values of all features, reflects the importance of one complication compared to others. The calibration curve was performed based on the predicted risk against the observed risk. We conducted a rigorous validation using the baseline DNN framework to independently predict PVL and CDs as separate single-label tasks. Furthermore, a sensitivity analysis was conducted by selectively ablating specific features from the training set, choosing features identified via SHAP analysis as having a differential importance between the two complications. The code was implemented in PyTorch 2.5, with analyses mainly conducted using scikit-learn.

Results

Patient characteristics

A total of 966 TAVR patients were retrospectively enrolled in the study, with 771 assigned to the training set (326 [42.3%] female; age: 73.0 [68.0; 78.0] years) and 195 to the testing set (80 [42.0%] female; age: 73.0 [69.0; 78.0] years). Demographic, echocardiographic, computerized tomographic (CT), procedural features, and comorbidities of patients stratified according to the dataset are shown in Table 1. Across all patients, postprocedural more than mild PVL occurred in 64 (6.63%) patients, and new-onset CDs were observed in 383 (39.6%) patients. No variables showed statistical differences between groups.

Table 1.

Baseline characteristics stratified according to the dataset.

	All	Training set	Testing set	p. overall
	N = 966	N = 771	N = 195
Baseline characteristics
Age, years	73.0 [68.0; 78.0]	73.0 [68.0; 78.0]	73.0 [68.0; 78.0]	0.453
Sex				0.813
Female	406 (42.0%)	326 (42.3%)	80 (41.0%)
Male	560 (58.0%)	445 (57.7%)	115 (59.0%)
BMI, kg/m²	22.9 [20.5; 25.2]	22.9 [20.6; 25.2]	22.9 [20.4; 25.1]	0.963
STS score	2.63 [1.80; 4.22]	2.59 [1.77; 4.18]	2.79 [2.10; 4.31]	0.308
NYHA_grade				0.266
Class I	9 (0.95%)	5 (0.66%)	4 (2.14%)
Class II	212 (22.4%)	174 (23.0%)	38 (20.3%)
Class III	587 (62.1%)	468 (61.7%)	119 (63.6%)
Class IV	137 (14.5%)	111 (14.6%)	26 (13.9%)
Hypertension	420 (43.5%)	343 (44.5%)	77 (39.5%)	0.239
Diabetes	171 (17.7%)	139 (18.0%)	32 (16.4%)	0.672
COPD	156 (16.1%)	122 (15.8%)	34 (17.4%)	0.662
Coronary artery disease	205 (21.2%)	168 (21.8%)	37 (19.0%)	0.447
Myocardial infarction	17 (1.76%)	11 (1.43%)	6 (3.08%)	0.128
PCI history	125 (12.9%)	100 (13.0%)	25 (12.8%)	1.000
Atrial fibrillation	145 (15.0%)	116 (15.0%)	29 (14.9%)	1.000
Peripheral vascular disease	111 (11.5%)	93 (12.1%)	18 (9.23%)	0.326
Cerebrovascular disease	129 (13.4%)	103 (13.4%)	26 (13.3%)	1.000
Renal insufficiency	64 (6.63%)	53 (6.87%)	11 (5.64%)	0.647
Renal insufficiency with dialysis	9 (0.93%)	6 (0.78%)	3 (1.54%)	0.396
Preprocedural echocardiography
Moderate or severe AR	615 (63.7%)	484 (62.8%)	131 (67.2%)	0.290
Moderate or severe MR	722 (74.7%)	576 (74.7%)	146 (74.9%)	1.000
Moderate or severe TR	667 (69.0%)	529 (68.6%)	138 (70.8%)	0.620
LVEF	61.0 [46.0; 67.0]	61.0 [46.0; 68.0]	60.0 [43.0; 67.0]	0.120
TEE gradient	52.0 [42.0; 67.0]	52.0 [41.0; 67.0]	51.0 [42.0; 66.8]	0.917
CT measurements
Lobe type				0.759
Bicuspid aortic valve	429 (44.4%)	340 (44.1%)	89 (45.6%)
Tricuspid aortic valve	537 (55.6%)	431 (55.9%)	106 (54.4%)
Annular perimeter	77.1 [71.0; 84.2]	77.1 [71.0; 84.2]	77.6 [71.3; 84.2]	0.641
SOV perimeter	111 [101; 121]	110 [101; 121]	112 [101; 124]	0.218
STJ diameter	31.9 [28.8; 35.1]	31.7 [28.7; 35.0]	32.1 [29.2; 35.4]	0.316
Left crown height	13.2 [11.2; 15.3]	13.2 [11.2; 15.3]	12.9 [11.2; 15.3]	0.608
Right crown height	15.2 [12.9; 17.7]	15.3 [12.8; 17.7]	15.0 [12.9; 17.7]	0.869
Calcification volume	61.8 [0.00; 469]	61.6 [0.00; 466]	83.6 [0.00; 491]	0.849
ECG characteristics
ICD				0.363
No	964 (99.8%)	770 (99.9%)	194 (99.5%)
Yes	2 (0.21%)	1 (0.13%)	1 (0.51%)
AVB-I	51 (5.31%)	46 (6.01%)	5 (2.56%)	0.083
AVB-II: No	961 (100%)	766 (100%)	195 (100%)	.
AVB-III: No	961 (100%)	766 (100%)	195 (100%)	.
LBBB: No	961 (100%)	766 (100%)	195 (100%)	.
RBBB				0.726
II	1 (0.10%)	1 (0.13%)	0 (0.00%)
III	31 (3.23%)	26 (3.39%)	5 (2.56%)
No	929 (96.7%)	739 (96.5%)	190 (97.4%)
LAH	23 (2.39%)	16 (2.09%)	7 (3.59%)	0.290
LPH: No	961 (100%)	766 (100%)	195 (100%)	.
Postprocedural complications
Postprocedural new-onset CDs				0.395
No	583 (60.4%)	471 (61.1%)	112 (57.4%)
Yes	383 (39.6%)	300 (38.9%)	83 (42.6%)
Postprocedural more than mild PVL				0.893
No	902 (93.4%)	719 (93.3%)	183 (93.8%)
Yes	64 (6.63%)	52 (6.74%)	12 (6.15%)

Note: Variables are expressed as frequency (%)or median (interquartile range).

BMI: body mass index; STS: Society of Thoracic Surgeons; NYHA: New York Heart Association; COPD: chronic obstructive pulmonary disease; MI: myocardial infarction; PCI: percutaneous coronary intervention; AR: aortic regurgitation; MR: mitral regurgitation; TR: tricuspid aortic valve; LVEF: left-ventricular ejection fraction; TEE: transesophageal echocardiography; SOV, sinus of Valsalva; ICD, implantable cardioverter-defibrillator; AVB, atrioventricular block; LBBB: left bundle branch block; RBBB: right bundle branch block; LAH: left anterior hemiblock; LPH: left posterior hemiblock; CDs: conduction disturbances; PVL: paravalvular leakage.

Muex DNN performance

The model achieved a micro-average accuracy of 0.740 (95% confidence interval [CI]: 0.684–0.777), an accuracy of 0.836 (95% CI: 0.733–0.895) for PVL, and an accuracy of 0.644 (95% CI: 0.554–0.721) for CDs, indicating that it was able to correctly classify instances with reasonable overall accuracy. The rate of subjects falsely predicted with both complications was 1.538% (n = 3). The composite endpoint yielded a sensitivity of 0.636 (95% CI: 0.513–0.733) and a corresponding specificity of 0.976 (95% CI: 0.707–0.814). In addition, the micro-average AUC was calculated to be 0.739 (95% CI: 0.723–0.831) with an AUC of 0.798 (95% CI: 0.714–0.901) for PVL and 0.675 (95% CI: 0.559–0.723) for CDs, indicating that the model has reasonable discriminative ability in distinguishing positive and negative examples (Figure 2 and Multimedia Appendix-Figure S1). In terms of calibration, the integrated calibration index (ICI) quantifies the average weighted difference between predicted and observed probabilities, while E50 represents the median absolute calibration error. The Muex model showed strong calibration performance, with an ICI of 0.012 and E50 of 0.005, indicating minimal deviation between predicted and observed probabilities and high reliability in risk estimation (Figure 3).

Figure 2.

Micro-average ROCs and AUCs of Muex Loss model, BCE Loss neural network model, MLKNN model, and LP model. AUC: area under the curve; ROC: receiver operating characteristics; LP: label powerests; MLKNN: multi-label k-nearest neighbor.

Figure 3.

Calibration curve for immediate incidence of PVL and CDs post-TAVR. The calibration curve demonstrates the agreement between predicted risk (x-axis) and observed risk (y-axis). For ICI and E50, lower values indicate better performance. PVL: paravalvular leakage; CDs: conduction disturbances; TAVR: transcatheter aortic valve replacement; ICI: integrated calibration index.

To interpret the Muex model, we applied the SHAP to evaluate feature importance. The SHAP-based heatmap is shown in Multimedia Appendix – Figures S2 and S3. The top ten features ranked by importance are shown in Figure 4, with preprocedural aortic valve structure and surrounding anatomical features demonstrating the highest predictive value. These include annular diameter, annular area, sinus of Valsalva (SOV) area, annular perimeter, Type 0 bicuspid aortic valve (BAV), SOV perimeter, ascending aorta (AAO) radius, and interventricular septum (IVS) thickness. The next most significant features are the hemodynamic and electrocardiographic conditions of the heart valves, including the absence of preoperative MR and the presence of atrial fibrillation detected by ECG. Ablating features highly influential for the PVL (annular area and SOV diameter) resulted in only minimal overall AUC fluctuations (decreasing slightly to 0.729 and 0.735). Conversely, ablating features highly influential for the CDs (LVEF and hypertension) resulted in only minimal overall AUC fluctuations (decreasing slightly to 0.733 and 0.738). The stability of the overall model metrics following these systematic removals validates a low intrinsic dependence on individual potential confounding factors, thereby confirming the model's robustness and strong resilience to interference (Multimedia Appendix – Table S1).

Figure 4.

Top 10 feature importance plot for the testing set. The SHAP feature importance plot shows the contribution of each feature to the model output, represented by the average of the SHAP values across all individuals in the dataset. SHAP: SHapley Additive exPlanations; ECG: electrocardiogram; non-MR: no preprocedural mitral regurgitation; IVS: interventricular septum; AAO: ascending aorta; SOV: sinus of Valsalva; BAV: bicuspid aortic valve.

Comparison of models

We compared the Muex model with the neural network, LP, and MLKNN models in the metrics (Table 2). XGBoost selected 79 features for building four models. The results demonstrated that the proposed Muex model significantly outperformed the neural network, LP, and MLKNN models. In terms of discrimination, Muex reached a higher micro-average AUROC (Muex: 0.739; neural network: 0.705; LP: 0.504; and MLKNN: 0.514), AUROC for PVL (Muex: 0.798; neural network:0.789; LP: 0.482; and MLKNN: 0.539), and AUROC for CDs (Muex: 0.675; neural network: 0.603; LP: 0.520; and MLKNN: 0.499) than the other three models. The Muex model also exhibited the highest micro-average, as well as the greatest specificity for both PVL and CDs. Additionally, the neural network model performed best in micro-average and CDs sensitivity. In terms of calibration, Muex was better calibrated than neural network, LP, and MLKNN models with smaller ICI and E50 (Muex: ICI = 0.012, E50 = 0.005; neural network: ICI = 0.116, E50 = 0.628; LP: ICI = 0.046, E50 = 0.731; and MLKNN: ICI = 0.051, E50 = 0.500). As detailed in Multimedia Appendix Table S2 and Figure S4, the independent DNN models yielded AUROCs of 0.644 for CDs and 0.674 for PVL, respectively.

Table 2.

Performance of the four models on the testing set.

Outcome	Models	ACC	AUC	Sensitivity	Specificity	Loss	p-value^a
CDs	Muex	0.644 (0.554–0.721)	0.675 (0.559–0.723)	0.642 (0.556–0.773)	0.644 (0.531–0.725)	0.841	–
	Neural network	0.587 (0.619–0.720)	0.603 (0.638, 0.766)	0.685 (0.568–0.773)	0.523 (0.612–0.726)	0.984	0.238
	LP	0.587 (0.520–0.661)	0.520 (0.520–0.661)	0.205 (0.104–0.294)	0.841 (0.771–0.907)	5.242	0.007
	MLKNN	0.519 (0.446–0.588)	0.499 (0.426–0.571)	0.400 (0.288–0.524)	0.598 (0.513–0.689)	8.023	0.004
PVL	Muex	0.836 (0.733–0.895)	0.798 (0.714–0.901)	0.572 (0.480–0.696)	0.847 (0.527–0.750)	0.495	–
	Neural network	0.751 (0.698–0.865)	0.789 (0.658–0.800)	0.571 (0.479–0.643)	0.758 (0.182–0.408)	0.531	0.648
	LP	0.926 (0.821–0.953)	0.482 (0.392–0.538)	0.964 (0.000–0.134)	0.734 (0.874–0.990)	7.344	<0.001
	MLKNN	0.774 (0.701–0.853)	0.539 (0.422–0.568)	0.285 (0.213–0.384)	0.794 (0.504–0.690)	2.598	<0.001
Composite endpoint^b	Muex	0.740 (0.684–0.777)	0.739 (0.723–0.831)	0.636 (0.513–0.733)	0.769 (0.707–0.814)	0.665	–
	Neural network	0.669 (0.619–0.720)	0.705 (0.638–0.766)	0.675 (0.568–0.773)	0.668 (0.612–0.726)	0.755	0.310
	LP	0.757 (0.712–0.799)	0.504 (0.488–0.575)	0.182 (0.101–0.268)	0.917 (0.882–0.947)	6.294	0.006
	MLKNN	0.647 (0.700–0.853)	0.514 (0.422–0.568)	0.390 (0.213–0.384)	0.718 (0.504–0.690)	5.311	0.008

ACC: accuracy; AUC: area under the curve; LP: label powerests; MLKNN: multi-label k-nearest neighbor; CDs: conduction disturbances; PVL: paravalvular leakage.

Compared with the Muex model on the testing set.

Micro-average results of composite endpoint.

Discussion

This study developed a risk prediction assessment DNN model for PVL and new-onset CDs in patients immediately following TAVR from April 2012 to July 2023. The key findings included (1) For the first time, we have successfully utilized the Muex DNN model to integrate multi-source clinical data, enabling effective multi-label prediction of PVL and CDs post-TAVR, which exhibit a complex seesaw-like pattern. (2) We demonstrated that the Muex model incorporates novel algorithmic refinements to accommodate outcomes with potentially intricate correlations, yielding enhanced predictive accuracy and outperforming other multi-label modeling strategies, including neural network models and traditional MLC methods (LP and MLKNN). (3) In preliminary investigations, the Muex model demonstrated robustness for diverse variable types and methodologies for handling random missing values. To our knowledge, this is the first predictive model to simultaneously predict PVL and new-onset CDs immediately post-TAVR using DNN, and it is poised to significantly advance personalized decision-making support in the early post-TAVR phase, ultimately improving long-term patient outcomes.

PVL and CDs remain critical issues following TAVR, negatively impacting mid- and long-term prognosis in prior studies.^6,20,21,33 However, these two complications in the previous studies do not share the same risk factors. In PVL, male sex,¹² non-diabetic status,⁵ anatomical features such as a BAV, a larger virtual raphe ring perimeter, LVOT eccentricity, calcification in the annulus, leaflets, and LVOT,^17–20 and procedural features including intentional supra-annular positioning of the bioprosthetic valve, self-expanding valves, valve undersizing^17,19,20,23 are associated with an increased risk. In CDs, male sex, age, baseline conduction defects (e.g. LBBB and prolonged QRS), coronary artery bypass grafting history, chronic lung disease, and the need for home oxygen,^3,13–16 anatomical features comprising membranous septal length, larger aortic annulus size and valve area, the ratio of the prosthesis to LVOT diameter, left ventricular end-diastolic diameter,^15,21,22 procedural features such as implantation depth on the septal side, over-expansion of the native aortic annulus, self-expanding valves, and valve oversizing^10,16,23 are important predictors. Additionally, the amount and the distribution of device landing zone calcium are important factors for both the degree of AR and the risk of PPMI post-TAVR.^12,13 A previous study developed a predictive model for PVL integrating anatomical and procedural variables with the CoreValve prosthesis, demonstrating improved PVL prediction (sensitivity = 68.7% and specificity = 88.1%).³⁴ Kiani's team developed the Emory risk score, incorporating factors including a clinical history of syncope, right bundle branch block, prolonged QRS, and valve oversizing, which demonstrated a strong association with PPMI (AUROC = 0.778).³⁵ Tsushima's team utilized a larger patient sample (n = 1390) and 14 ML-based classifiers to predict PPMI post-TAVR, achieving the highest AUROC of 0.82.³⁶ Existing studies have mostly used independent modeling methods to predict the risks of PVL and CDs post-TAVR, without a risk model simultaneously predicting both. This will constrain the one-stop clinical evaluation of valvular interventional procedures and introduce potential biases in the formulation of procedural strategies. Specifically, while a dedicated single-label classifier may yield marginally superior performance for individual outcomes, it neglects their intrinsic interrelationships. Our multi-label framework drives the model to learn robust shared representations, thereby enhancing predictive consistency and generalization, which represents an invaluable advantage for trade-off decisions in real-world TAVR clinical practice.

Our study has advanced the field by developing a DNN model that simultaneously predicts both PVL and CDs post-TAVR for the first time, achieving a notable average AUROC of 0.739. Moreover, compared to previous studies, we collected a comprehensive set of 144 features, including preprocedural characteristics, CT, ECG, echocardiography, and procedural data, with 79 features selected for model development. The feature importance analysis of the Muex model indicates that baseline characteristics (e.g. body mass index, cerebrovascular disease, COPD, creatinine, history of atrial fibrillation, and history of percutaneous coronary intervention), CT parameters (e.g. LVOT diameter, AAO radius, SOV diameter, and right and left coronary heights), preprocedural echocardiographic features (e.g. functional bicuspid valve and non-MR), coronary protection, and preprocedural atrial fibrillation detected by ECG all demonstrate comparable feature importance in predicting both PVL and CDs, which is consistent with previous studies. Besides, calcification volume, no preprocedural PVL, aortic diameter, effective orifice area, AS type, age, sex, weight, and height exhibit significantly greater feature importance in predicting PVL. In contrast, for CDs, left ventricular posterior wall thickness, IVS thickness, previous myocardial infarction, pulmonary artery pressure, annular area, annular eccentricity, annular diameter, and BAV with fusion of right and left coronary leaflets demonstrate significantly greater feature importance. Notably, our SHAP analysis identified the Right Atrial dimension as highly influential in the multi-label task, notwithstanding its exclusion from the top 25 predictors in the single-label counterpart. This suggests the Muex model captures latent correlations between outcomes that task-specific algorithms fail to detect, providing a more holistic representation of risk. The shared and divergent feature importance underscores the complexity of the interplay between PVL and CDs. Despite the constrained dataset of patients, resulting in acceptable performance for prediction, these insights can still function as a critical reference for clinicians, guiding decision-making and achieving personalized medicine in multiple clinical settings.

ML-based multi-label models are now applied in imaging, ECG-based disease diagnosis,³⁷ adverse event prediction,^38,39 and postprocedural complication prediction,⁴⁰ gaining increasing attention in the cardiovascular field in recent years. Based on problem transformation methods and algorithm adaptation, Jamthikar's team employed MLC models to predict three cardiovascular events, achieving an optimal AUC of 0.89.⁴¹ Meng's team developed a multi-label DNN model to predict cardiovascular events and organ-specific outcomes in patients with hypertensive disorders of pregnancy (AUROC = 0.878; and average precision = 0.239).⁴² Some studies suggest that the interdependencies among complications are crucial in predictive modeling, with their contributions potentially outweighing those of preprocedural factors.^26,43 Thus, multi-label ML, accounting for interplay between complications, may aid clinicians in balancing patient risks during clinical decisions, such as TAVR valve selection and procedures. Muex DNN model, optimized with BCE Loss through an enhanced penalty weight mechanism, outperformed basic neural network and two MLC models by better aligning with the non-completely exclusive occurrence of the two complications. Comparative analysis with independent single-label DNNs (Supplemental Table S2 and Supplemental Figure S4) revealed sub-optimal performance, with AUROCs of only 0.644 (CDs) and 0.674 (PVL), and a marked sensitivity-specificity disparity in PVL prediction. By heavily penalizing the double-positive (1, 1) state, our weighted loss function compels the Muex model to predict simultaneous complications cautiously. This multi-label approach circumvents single-task overfitting by imposing joint constraints on shared features. Such a mechanism enhances the identification of critical variables, including implantation depth, pre-dilation, and post-dilation strategies, which simultaneously influence the balance of risk, thereby providing a globally optimized assessment for intra-procedural decision-making. Compared to MLKNN, which transforms multi-label problems into multiple binary classification tasks, and LP, which redefines multi-label problems as single-label multi-class classification, deep learning techniques excel in capturing the hierarchical structure and interdependencies between labels.⁴⁴ Besides, accurate prediction necessitates the integration of multi-source data, while biased feature selection can undermine model accuracy.⁴⁵ Also, striking a balance between utilizing information about other complications and minimizing the risk of attribute noise is crucial.⁴⁶ AI, especially DNN, offer a robust approach for integrating weak predictors, thereby significantly enhancing prediction accuracy.⁴⁷ Additionally, the design of simplified and discretized preprocedural variables may help enhance the model's generalization potential, enabling its transferability within pervasive healthcare systems. The clinical utility of this multi-label deep neural network model is underpinned by its inherent potential for seamless integration into established healthcare management systems, which will be crucial for realizing its translational impact. Initially, to facilitate rapid validation, the Muex model is targeted for integration into our proprietary Valvular Heart Disease (VHD) Intelligent Management Platform, an independently developed system for specialized cohort management. Already adopted across multiple centers, this kind of platform is poised to serve as a high-speed testbed to demonstrate the model's near-term scalability and clinical feasibility. Concurrently, electronic health record (EHR) embedded clinical decision support systems (CDSS) have shown high potential for multiple applications in healthcare and are becoming increasingly common. A primary advantage of CDSS is its capacity to promote clinician adherence to current practice guidelines. Studies demonstrate that CDSS is particularly effective in enhancing adherence to preoperative testing guidelines, resulting in significant cost savings.⁴⁸ For widespread and seamless clinical adoption, the model is designed to eventually function as an EHR-CDS tool. This integrated approach holds tremendous promise for fundamentally transforming the clinical workflow.

However, this study has limitations. First, as a single-center retrospective study, potential biases in patient selection and confounding by indication exist, warranting confirmation in multicenter, prospective studies. Besides, the progression of PVL and CDs post-TAVR, potentially driven by tissue edema and inflammation, poses a challenge for future risk prediction.⁴⁹ And biases regarding racial and socioeconomic disparities may persist on a global scale. Additionally, it remains uncertain whether the model's performance significantly decreases when applied to multi-source heterogeneous data of varying quality, such as echocardiograms and CT parameters, necessitating further investigation. We also plan to conduct prospective studies to validate the clinical utility of this model.

Conclusion

In conclusion, we developed and validated an interpretable DNN model called Muex that accurately predicted the risk of PVL and CDs post-TAVR. This DNN model outperformed the neural networks, LP, and MLKNN models in discrimination and calibration. Additionally, we provided a unique insight into how to use the relationship to assist prediction. The identification of high-risk patients and inference of the risk source for specific complications provided by the model has the potential to aid and improve complication management.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076261427502 - Supplemental material for Development of a multi-label deep neural network model for predicting immediate paravalvular leakage and new-onset conduction disturbances after transcatheter aortic valve replacement: A retrospective cohort study

Supplemental material, sj-docx-1-dhj-10.1177_20552076261427502 for Development of a multi-label deep neural network model for predicting immediate paravalvular leakage and new-onset conduction disturbances after transcatheter aortic valve replacement: A retrospective cohort study by Rui-si Tang, Yi-ming Li, Yun Bao, Jian-da Zeng, Yi-jun Yao, Jian-yong Wang, Xin-yue Yang, Yan-jiani Xu, Xing-zhou Pu, Wei-li Jiang, Yu-Heng Jia, Yue Yin, Hongde Li, Weiya Li, Zhang Yi and Mao Chen in DIGITAL HEALTH

Supplemental Material

sj-docx-2-dhj-10.1177_20552076261427502 - Supplemental material for Development of a multi-label deep neural network model for predicting immediate paravalvular leakage and new-onset conduction disturbances after transcatheter aortic valve replacement: A retrospective cohort study

Supplemental material, sj-docx-2-dhj-10.1177_20552076261427502 for Development of a multi-label deep neural network model for predicting immediate paravalvular leakage and new-onset conduction disturbances after transcatheter aortic valve replacement: A retrospective cohort study by Rui-si Tang, Yi-ming Li, Yun Bao, Jian-da Zeng, Yi-jun Yao, Jian-yong Wang, Xin-yue Yang, Yan-jiani Xu, Xing-zhou Pu, Wei-li Jiang, Yu-Heng Jia, Yue Yin, Hongde Li, Weiya Li, Zhang Yi and Mao Chen in DIGITAL HEALTH

Footnotes

Abbreviations

Acknowledgements

The authors thank all colleagues who supported the research and the TAVR patients and their families for their collaboration in this work.

ORCID iDs

Rui-si Tang

Yue Yin

Author contributions

Mao Chen and Zhang Yi designed this work. Rui-si Tang, Yi-ming Li, Yun Bao, Jian-da Zeng, and Yi-jun Yao developed the neural network architecture and performed most of the data analysis. Jian-da Zeng, Jian-yong Wang, and Wei-li Jiang help to complete the optimization and fine-tuning of the neural network. Rui-si Tang, Xin-yue Yang, Xing-zhou Pu, Yu-heng Jia, Yan-jiani Xu, Yue Yin, Hongde Li, and Weiya Li have done the patient organization and data collection. Rui-si Tang prepared all tables and figures. Rui-si Tang, Yi-ming Li, and Jian-da Zeng wrote the main manuscript. All authors reviewed the manuscript and approved the submitted version.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (U23A20395, 62306192, 82170375, and 62476185); 1.3.5 project for disciplines of excellence from West China Hospital of Sichuan University (ZYGD23021 and 23HXFH009); Natural Science Foundation of Sichuan Province, China (2023NSFSC1638 and 62476185); Chengdu Key Research and Development Support Program (2025-XT00-00014-GX).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The data and code underlying this article will be shared on reasonable request to the corresponding author.

Supplemental material

Supplemental material for this article is available online.

References

Smith

Leon

Mack

, et al. Transcatheter versus surgical aortic-valve replacement in high-risk patients. N Engl J Med 2011; 364: 2187–2198.

Leon

Smith

Mack

, et al. Transcatheter aortic-valve implantation for aortic stenosis in patients who cannot undergo surgery. N Engl J Med 2010; 363: 1597–1607.

Fadahunsi

Olowoyeye

Ukaigwe

, et al. Incidence, predictors, and outcomes of permanent pacemaker implantation following transcatheter aortic valve replacement. JACC: Cardiovasc Interventions 2016; 9: 2189–2199.

Rooijakkers

MJP

Stens

van Wely

, et al. Diastolic delta best predicts paravalvular regurgitation after transcatheter aortic valve replacement as assessed by cardiac magnetic resonance: the APPOSE trial. Eur Heart J Cardiovasc Imaging 2023; 24: 1072–1081.

Van Belle

Juthier

Susen

, et al. Postprocedural aortic regurgitation in balloon-expandable and self-expandable transcatheter aortic valve replacement procedures: analysis of predictors and impact on long-term mortality: insights from the FRANCE2 registry. Circulation 2014; 129: 1415–1427.

Kodali

Pibarot

Douglas

, et al. Paravalvular regurgitation after transcatheter aortic valve replacement with the Edwards Sapien valve in the PARTNER trial: characterizing patients and impact on outcomes. Eur Heart J 2015; 36: 449–456.

Sinning

J-M

Vasa-Nicotera

Chin

, et al. Evaluation and management of paravalvular aortic regurgitation after transcatheter aortic valve replacement. J Am Coll Cardiol 2013; 62: 11–20.

Auffret

Puri

Urena

, et al. Conduction disturbances after transcatheter aortic valve replacement: current status and future perspectives. Circulation 2017; 136: 1049–1069.

Mack

Leon

Thourani

, et al. Transcatheter aortic-valve replacement with a balloon-expandable valve in low-risk patients. N Engl J Med 2019; 380: 1695–1705.

10.

Matsushita

Kanso

Ohana

, et al. Periprocedural predictors of new-onset conduction abnormalities after transcatheter aortic valve replacement. Circ J 2020; 84: 1875–1883.

11.

Kikuchi

Minamimoto

Matsushita

, et al. Impact of new-onset right bundle-branch block after transcatheter aortic valve replacement on permanent pacemaker implantation. JAHA 2024; 13: e032777.

12.

Hokken

Veulemans

Adrichem

, et al. Sex-specific aortic valve calcifications in patients undergoing transcatheter aortic valve implantation. Eur Heart J Cardiovasc Imaging 2023; 24: 768–775.

13.

Mauri

Deuschl

Frohn

, et al. Predictors of paravalvular regurgitation and permanent pacemaker implantation after TAVR with a next-generation self-expanding device. Clin Res Cardiol 2018; 107: 688–697.

14.

Nazif

Williams

Hahn

, et al. Clinical implications of new-onset left bundle branch block after transcatheter aortic valve replacement: analysis of the PARTNER experience. Eur Heart J 2014; 35: 1599–1607.

15.

Nazif

Dizon

Hahn

, et al. Predictors and clinical outcomes of permanent pacemaker implantation after transcatheter aortic valve replacement. JACC: Cardiovasc Interventions 2015; 8: 60–69.

16.

Husser

Pellegrini

Kessler

, et al. Predictors of permanent pacemaker implantations and new-onset conduction abnormalities with the SAPIEN 3 balloon-expandable transcatheter heart valve. JACC: Cardiovasc Interventions 2016; 9: 244–254.

17.

Yoon

S-H

Ahn

J-M

Ohno

, et al. Predictors for paravalvular regurgitation after TAVR with the self-expanding prosthesis: quantitative measurement of MDCT analysis. JACC Cardiovasc Imaging 2016; 9: 1233–1234.

18.

Hansson

Leipsic

Pugliese

, et al. Aortic valve and left ventricular outflow tract calcium volume and distribution in transcatheter aortic valve replacement: influence on the risk of significant paravalvular regurgitation. J Cardiovasc Comput Tomogr 2018; 12: 290–297.

19.

Zito

Buono

Scotti

, et al. Incidence, predictors, and outcomes of paravalvular regurgitation after TAVR in Sievers type 1 bicuspid aortic valves. JACC Cardiovasc Interv 2024; 17: 1652–1663.

20.

Tang

GHL

Zaid

Schnittman

, et al. Novel predictors of mild paravalvular aortic regurgitation in SAPIEN 3 transcatheter aortic valve implantation. EuroIntervention 2018; 14: 58–68.

21.

Judson

Agrawal

Mahadevan

. Conduction system abnormalities after transcatheter aortic valve replacement. Interventional Cardiology Clinics 2019; 8: 403–409.

22.

Hamdan

Guetta

Klempfner

, et al. Inverse relationship between membranous septal length and the risk of atrioventricular block in patients undergoing transcatheter aortic valve implantation. JACC: Cardiovasc Interventions 2015; 8: 1218–1228.

23.

Abdel-Wahab

Mehilli

Frerker

, et al. Comparison of balloon-expandable vs self-expandable valves in patients undergoing transcatheter aortic valve replacement: the CHOICE randomized clinical trial. JAMA 2014; 311: 1503.

24.

Fritz

King

Abdelhack

, et al. Effect of machine learning models on clinician prediction of postoperative complications: the perioperative ORACLE randomised clinical trial. Br J Anaesth 2024; 133: 1042–1050.

25.

Ren

Loftus

Datta

, et al. Performance of a machine learning algorithm using electronic health record data to predict postoperative complications and report on a mobile platform. JAMA Netw Open 2022; 5: e2211973.

26.

Zhang

, et al. Development and validation of an interpretable Markov-embedded multilabel model for predicting risks of multiple postoperative complications among surgical inpatients: a multicenter prospective cohort study. Int J Surg 2024; 110: 130–143.

27.

Hernandez-Suarez

Kim

Villablanca

, et al. Machine learning prediction models for in-hospital mortality after transcatheter aortic valve replacement. JACC: Cardiovasc Interventions 2019; 12: 1328–1338.

28.

Collins

Moons

KGM

Dhiman

, et al. TRIPOD+ AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. Br Med J 2024: e078378. 10.1136/bmj-2023-078378

29.

Praz

Borger

Lanz

, et al. 2025 ESC/EACTS guidelines for the management of valvular heart disease. Eur Heart J 2025: ehaf194. 10.1093/eurheartj/ehaf194

30.

Muntasir Nishat

Faisal

Jahan Ratul

, et al. A comprehensive investigation of the performances of different machine learning classifiers with SMOTE-ENN oversampling technique and hyperparameter optimization for imbalanced heart failure dataset. Sci Program 2022; 2022: 1–17.

31.

Kobayashi

. Two-way multi-label loss. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 7476–7485. IEEE. 10.1109/CVPR52729.2023.00722.

32.

Zoubir

Iskandler

. Bootstrap methods and applications. IEEE Signal Process Mag 2007; 24: 10–19.

33.

Kodali

Williams

Smith

, et al. Two-year outcomes after transcatheter or surgical aortic-valve replacement. N Engl J Med 2012; 366: 1686–1695.

34.

Mostafa

Richardt

Abdel-Wahab

. Clinical utility of a predictive model for paravalvular aortic regurgitation after transcatheter aortic valve implantation with a self-expandable prosthesis. Egyptian Heart J 2017; 69: 253–259.

35.

Kiani

Kamioka

Black

, et al. Development of a risk score to predict new pacemaker implantation after transcatheter aortic valve replacement. JACC: Cardiovasc Interventions 2019; 12: 2133–2142.

36.

Tsushima

Al-Kindi

Nadeem

, et al. Machine learning algorithms for prediction of permanent pacemaker implantation after transcatheter aortic valve replacement. Circ Arrhythm Electrophysiol 2021; 14: e008941.

37.

Priya

Peter

. A federated approach for detecting the chest diseases using DenseNet for multi-label classification. Complex Intell Syst 2022; 8: 3121–3129.

38.

Mesinovic

Yang

K-W

. Multi-label neural model for prediction of myocardial infarction complications with resampling and explainability. In: 2022 IEEE-EMBS international conference on biomedical and health informatics (BHI), Ioannina, Greece, 2022, pp. 01–05. IEEE. 10.1109/BHI56158.2022.9926915.

39.

El-Hasnony

Elzeki

Alshehri

, et al. Multi-label active learning-based machine learning model for heart disease prediction. Sensors 2022; 22: 1184.

40.

Hofer

Lee

Gabel

, et al. Development and validation of a deep neural network model to predict postoperative mortality, acute kidney injury, and reintubation using a single feature set. NPJ Digit Med 2020; 3: 58.

41.

Jamthikar

Gupta

Johri

, et al. A machine learning framework for risk prediction of multi-label cardiovascular events based on focused carotid plaque B-mode ultrasound: a Canadian study. Comput Biol Med 2022; 140: 105102.

42.

Meng

M-L

Fuller

, et al. Development and validation of a predictive model for maternal cardiovascular morbidity events in patients with hypertensive disorders of pregnancy. Anesth Analg 2024. 10.1213/ANE.0000000000007278

43.

Shen

Zhang

, et al. Construction and evaluation of networks among multiple postoperative complications. Comput Methods Programs Biomed 2023; 232: 107439.

44.

Zhang

, et al. Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets. J Data Inform Sci 2024; 9: 81–103.

45.

LeCun

Bengio

Hinton

. Deep learning. Nature 2015; 521: 436–444.

46.

Montañes

Senge

Barranquero

, et al. Dependent binary relevance models for multi-label classification. Pattern Recognit 2014; 47: 1494–1508.

47.

Alzubaidi

Zhang

Humaidi

, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 2021; 8: 53.

48.

Gracia Martínez

Pfang

Morales Coca

MÁ

, et al. Implementing a closed loop clinical decision support system for sustainable preoperative care. NPJ Digit Med 2025; 8: 6.

49.

Oestreich

Gurevich

Adabag

, et al. Exposure to glucocorticoids prior to transcatheter aortic valve replacement is associated with reduced incidence of high-degree AV block and pacemaker. Cardiovasc Revasc Med 2019; 20: 328–331.