Abstract
Background
Paravalvular leakage (PVL) and conduction disturbances (CDs) are important complications after transcatheter aortic valve replacement (TAVR). While existing risk prediction models predominantly adopt single-complication modeling strategies, overlooking the interrelatedness.
Objectives
We aimed to develop a multi-label prediction model based on deep learning to predict immediate PVL and new-onset CDs post-TAVR simultaneously.
Methods
The study retrospectively included 966 patients who underwent first-time TAVR for aortic stenosis between April 2012 and July 2023 from the Sichuan University TAVR Registry. A deep learning-based model using the optimization algorithm Muex with 79 features and neural network labels for PVL and new-onset CDs immediately after TAVR was developed. The Muex model was validated using the bootstrap method, evaluated by area under the receiver operating characteristic curve (AUROC) and calibration curves, interpreted with Shapley Additive Explanations, and subsequently compared with a neural network model and two traditional multi-label classification models.
Results
The dataset included 771 training and 195 testing patients, with 6.63% exhibiting more than mild PVL and 39.6% developing new-onset CDs. The Muex model outperformed the neural network, label powerests, and multi-label k-nearest neighbor in both discrimination (micro-average AUROC: 0.739 vs. 0.705 vs. 0.504 vs. 0.514) and calibration (integrated calibration index [ICI]: 0.012 vs. 0.116 vs. 0.046 vs. 0.051), demonstrating strong performance in predicting both complications simultaneously.
Conclusion
The study demonstrated that the Muex model is feasible for simultaneously predicting PVL and CDs post-TAVR, excelling in both performance and interpretability, while identifying high-risk patients and inferring patient-specific risk factors to facilitate informed clinical decision-making.
Trial registration
ClinicalTrials.gov, NCT04415047.
Overview of study design and the architecture of Muex model.
Keywords
Introduction
Transcatheter aortic valve replacement (TAVR) is now widely recognized as a viable treatment for patients with severe aortic stenosis who are at intermediate to high or prohibitive surgical risk.1,2 However, paravalvular leakage (PVL) and conduction disturbances (CDs) (i.e. high-degree atrioventricular block (HAVB) requiring permanent pacemaker implantation (PPMI) and new-onset left bundle-branch block (LBBB)) are two main complications post-TAVR and are associated with an increased risk of late mortality and rehospitalizations.3–9 PVL is common post-TAVR, with mild cases occurring in up to 40% and moderate or greater in up to 10% in contemporary studies. 4 CDs occur in 31%–45% of patients post-TAVR, depending on the valve type. 10 New-onset LBBB occurs in 85%–94% of cases during the periprocedural period. 11 Therefore, accurate prediction of PVL and CDs following TAVR is essential for optimizing both therapeutic strategies and prognostic assessment.
A number of previous studies have shown that PVL and CDs post-TAVR are associated with clinical risk factors.3,5,10,12–23 PVL results from incomplete circumferential apposition of the prosthesis with the annulus. And new-onset CDs are associated with the anatomical proximity of the conduction system to the aortic root and left ventricular outflow tract (LVOT). Due to the overlapping yet inconsistent and intrinsic associations between preprocedural patient characteristics and procedural predictors, these two major complications involve distinct and even conflicting pathophysiological and anatomical mechanisms. As a result, their clinical occurrences often follow a complex, seesaw-like pattern, making it historically challenging to develop a model that can simultaneously predict both.
The rapid development of machine learning (ML), especially artificial intelligence (AI) technology, has promoted the innovation of medical tasks. Neural networks, with advantages in multimodal data processing and multi-node output characteristics, provide a feasible solution for high-dimensional multi-source data processing and multi-label disease prediction.24–26 ML models have garnered remarkable results in predicting complications post-TAVR, like in-hospital mortality (area under the receiver operating characteristic curve [AUROC]: 0.89–0.95). 27 Nonetheless, existing research predominantly focuses on identifying risk factors for single complications, with limited investigation into the interplay between different complications using a unified model. Unified modeling requires integrating clinical, laboratory, imaging, and procedural indicators while employing algorithms that address complex inter-complication relationships. The lack of systematic data collection and methodological support presents significant technical limitations, impeding progress and hindering the clinical translation of comprehensive complication management.
The main objective of this study is to develop a multi-label deep neural network (DNN) model to simultaneously predict immediate PVL and new-onset CDs post-TAVR using multi-source data. Specifically, we proposed an innovative loss function to empower the predictive DNN model to better predict the incidence of both complications, outperforming the basic neural network model and two traditional multi-label classification (MLC) models. We assume that this complication-specific risk assessment and individualized interpretation will provide an intuitive and thorough understanding of PVL and CDs risk post-TAVR, which is anticipated to be beneficial in decision-making in the therapeutic scenarios.
Methods
Data collection and preparation:
The study population included patients who underwent first-time TAVR for aortic stenosis between April 2012 and July 2023, retrospectively collected from the West China Hospital of Sichuan University TAVR Registry. This registry was designed to sequentially enroll all aortic stenosis patients undergoing TAVR at West China Hospital. This trial is registered with ClinicalTrials.gov, NCT04415047, and is ongoing. The study was conducted in accordance with the Declaration of Helsinki, approved by the Ethics Committee on Biomedical Research of West China Hospital, and conformed to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD + AI) checklist (Multimedia Appendix-Table S1
All patients underwent a thorough baseline clinical evaluation based on guidelines for the management of valvular heart disease, 29 including a comprehensive assessment of cardiovascular risk factors and the calculation of risk scores, specifically the Society of Thoracic Surgeons (STS) score and New York Heart Association (NYHA) classification. Eligibility for TAVR was evaluated in all patients by the multidisciplinary heart team. The valves were selected based on availability at the time of treatment. Data were meticulously collected using case record forms (CRFs) and an electronic data capture system, with a dedicated team responsible for data validation. All preprocedural variables were evaluated within one week before the procedure. All clinical data, as well as follow-up outcomes of patients, are integrated and managed via the self-developed Valvular Heart Disease Intelligent Management Platform of our team, which underpins the long-term comprehensive management of these patients.
The primary endpoints of interest were defined as more than mild PVL and new-onset CDs (including LBBB or second- and third-degree atrioventricular block [AVB]), occurring immediately post-TAVR. Both endpoints were classified according to the Valve Academic Research Consortium-2 criteria. The adjudication of PVL was conducted by the multidisciplinary heart team using transesophageal echocardiography (TEE), while CDs were assessed through an immediate 12-lead electrocardiogram (ECG). Exclusion criteria include (1) patients undergoing non-first-time TAVR; (2) patients with isolated aortic regurgitation; (3) patients with a preoperative pacemaker or with AVB-II, AVB-III, or LBBB; (4) patients with more than 20% of missingness. Then we randomly divided the patients into training and testing sets according to an 8:2 ratio (Figure 1).

Flowchart of study population. TAVR: transcatheter aortic valve replacement; AVB: atrioventricular block; LBBB: left bundle branch block.
Feature coding
Variables with more than 15% of missingness were excluded from the analysis. Categorical and ordinal variables were assigned discrete values, while continuous variables retained original values. We used different imputation methods for different types of missing values, striving to preserve the integrity of the original data while ensuring that the dataset was formatted correctly for successful input into the model. Specifically, we set missing values of continuous attributes to −1. For categorical attributes, we used zero-fill methods and started encoding from 1. Following data coding and imputation, each patient's characteristics were represented as a multidimensional vector. Using XGBoost, 79 features were selected from 144 candidate variables for prediction model construction, with features exhibiting a feature importance score of 0 (calculated via the gain method) excluded. To address the significant data imbalance, we employed the Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors (SMOTE-ENN), with k set to 3. This hybrid resampling approach integrates SMOTE to generate synthetic samples for the minority class and ENN to remove noise from the majority class, thereby enhancing the model's predictive robustness. Increasingly recognized in clinical informatics, this technique served a dual purpose in our study: first, mitigating the primary imbalance between the occurrence and non-occurrence of complications; and second, harmonizing the disparate incidence rates across specific complication types to ensure unbiased model performance. 30 We utilized 79 features to analyze the performance of the Muex DNN Model from the classification experiment. There are 12 demographic and clinical features (height, STS score, creatinine, Syncope, etc.), three preprocedural ECG features (atrial fibrillation and AVB), 20 preprocedural echocardiographic features (mitral regurgitation [MR], etc.), 33 preprocedural CT features (lobe type, etc.), one procedural feature (coronary protection), and 10 comorbidities and historical medical features (hypertension, chronic obstructive pulmonary disease [COPD], etc.).
Deep learning model
In this study, a deep learning model, namely Muex Model, was proposed to simultaneously estimate the prediction of the risk of PVL and new-onset CDs immediately post-TAVR. The Muex model is a feed-forward neural network architecture with multiple layers, including an input layer, two fully connected hidden layers, and an output layer. Both the input and hidden layers are followed by a ReLU activation function, which introduces nonlinearity to the network. The output layer consists of two nodes, each responsible for estimating the risk index of PVL and CDs. The network also incorporates a batch-normalization layer and a Tanh activation function in the activation layers to enhance performance.
We proposed an optimized binary cross-entropy loss (BCE Loss) function named Muex Loss to train our model, which acts as a performance metric and guides the learning process. BCE Loss refines models, improving both overall and sample-wise classification in multi-label tasks. 31
We optimized BCE Loss by adapting a new sophisticated adaptive weighting scheme. For a single positive label, we applied a “penalty weight,” and for both labels positive, a double penalty, reducing the likelihood of both being positive simultaneously. By applying this weight strategy to BCE Loss based on label comparisons, we improve feature discrimination and expand the classification margin, enhancing the model's ability to capture data patterns. The optimized BCE Loss can be expressed as follows:
Model validation and evaluation
We trained three additional models: a neural network model and two MLC models (label powerests [LP] and multi-label k-nearest neighbor [MLKNN]) to conduct a comparative study, aiming to explore the effectiveness of DNN in prediction and the value of the model in uncovering correlations between outcomes.
All models underwent validation using 1000 times bootstrapping. 32 Model evaluation was conducted using the following metrics: discriminative performance was assessed with receiver operating characteristic (ROC) curves; sensitivity and specificity were determined based on the cutoff point on the ROC curve; calibration was assessed using calibration curves. Additionally, we used SHapley Additive exPlanations (SHAP) to enhance model transparency and highlight feature importance.
Statistics and analysis
Continuous variables are expressed as mean ± standard deviation (SD), and categorical variables are reported as counts and percentages. T-tests and chi-squared tests were used to evaluate differences between the groups for continuous and categorical variables, respectively. The features were standardized utilizing the Z-score normalization method. ROC curve was drawn based on the true- and false-positive rates. The contribution ratio (CR), calculated as the SHAP value of a feature relative to the average SHAP values of all features, reflects the importance of one complication compared to others. The calibration curve was performed based on the predicted risk against the observed risk. We conducted a rigorous validation using the baseline DNN framework to independently predict PVL and CDs as separate single-label tasks. Furthermore, a sensitivity analysis was conducted by selectively ablating specific features from the training set, choosing features identified via SHAP analysis as having a differential importance between the two complications. The code was implemented in PyTorch 2.5, with analyses mainly conducted using scikit-learn.
Results
Patient characteristics
A total of 966 TAVR patients were retrospectively enrolled in the study, with 771 assigned to the training set (326 [42.3%] female; age: 73.0 [68.0; 78.0] years) and 195 to the testing set (80 [42.0%] female; age: 73.0 [69.0; 78.0] years). Demographic, echocardiographic, computerized tomographic (CT), procedural features, and comorbidities of patients stratified according to the dataset are shown in Table 1. Across all patients, postprocedural more than mild PVL occurred in 64 (6.63%) patients, and new-onset CDs were observed in 383 (39.6%) patients. No variables showed statistical differences between groups.
Baseline characteristics stratified according to the dataset.
Note: Variables are expressed as frequency (%)or median (interquartile range).
BMI: body mass index; STS: Society of Thoracic Surgeons; NYHA: New York Heart Association; COPD: chronic obstructive pulmonary disease; MI: myocardial infarction; PCI: percutaneous coronary intervention; AR: aortic regurgitation; MR: mitral regurgitation; TR: tricuspid aortic valve; LVEF: left-ventricular ejection fraction; TEE: transesophageal echocardiography; SOV, sinus of Valsalva; ICD, implantable cardioverter-defibrillator; AVB, atrioventricular block; LBBB: left bundle branch block; RBBB: right bundle branch block; LAH: left anterior hemiblock; LPH: left posterior hemiblock; CDs: conduction disturbances; PVL: paravalvular leakage.
Muex DNN performance
The model achieved a micro-average accuracy of 0.740 (95% confidence interval [CI]: 0.684–0.777), an accuracy of 0.836 (95% CI: 0.733–0.895) for PVL, and an accuracy of 0.644 (95% CI: 0.554–0.721) for CDs, indicating that it was able to correctly classify instances with reasonable overall accuracy. The rate of subjects falsely predicted with both complications was 1.538% (n = 3). The composite endpoint yielded a sensitivity of 0.636 (95% CI: 0.513–0.733) and a corresponding specificity of 0.976 (95% CI: 0.707–0.814). In addition, the micro-average AUC was calculated to be 0.739 (95% CI: 0.723–0.831) with an AUC of 0.798 (95% CI: 0.714–0.901) for PVL and 0.675 (95% CI: 0.559–0.723) for CDs, indicating that the model has reasonable discriminative ability in distinguishing positive and negative examples (Figure 2 and Multimedia Appendix-Figure S1). In terms of calibration, the integrated calibration index (ICI) quantifies the average weighted difference between predicted and observed probabilities, while E50 represents the median absolute calibration error. The Muex model showed strong calibration performance, with an ICI of 0.012 and E50 of 0.005, indicating minimal deviation between predicted and observed probabilities and high reliability in risk estimation (Figure 3).

Micro-average ROCs and AUCs of Muex Loss model, BCE Loss neural network model, MLKNN model, and LP model. AUC: area under the curve; ROC: receiver operating characteristics; LP: label powerests; MLKNN: multi-label k-nearest neighbor.

Calibration curve for immediate incidence of PVL and CDs post-TAVR. The calibration curve demonstrates the agreement between predicted risk (x-axis) and observed risk (y-axis). For ICI and E50, lower values indicate better performance. PVL: paravalvular leakage; CDs: conduction disturbances; TAVR: transcatheter aortic valve replacement; ICI: integrated calibration index.
To interpret the Muex model, we applied the SHAP to evaluate feature importance. The SHAP-based heatmap is shown in Multimedia Appendix – Figures S2 and S3. The top ten features ranked by importance are shown in Figure 4, with preprocedural aortic valve structure and surrounding anatomical features demonstrating the highest predictive value. These include annular diameter, annular area, sinus of Valsalva (SOV) area, annular perimeter, Type 0 bicuspid aortic valve (BAV), SOV perimeter, ascending aorta (AAO) radius, and interventricular septum (IVS) thickness. The next most significant features are the hemodynamic and electrocardiographic conditions of the heart valves, including the absence of preoperative MR and the presence of atrial fibrillation detected by ECG. Ablating features highly influential for the PVL (annular area and SOV diameter) resulted in only minimal overall AUC fluctuations (decreasing slightly to 0.729 and 0.735). Conversely, ablating features highly influential for the CDs (LVEF and hypertension) resulted in only minimal overall AUC fluctuations (decreasing slightly to 0.733 and 0.738). The stability of the overall model metrics following these systematic removals validates a low intrinsic dependence on individual potential confounding factors, thereby confirming the model's robustness and strong resilience to interference (Multimedia Appendix – Table S1).

Top 10 feature importance plot for the testing set. The SHAP feature importance plot shows the contribution of each feature to the model output, represented by the average of the SHAP values across all individuals in the dataset. SHAP: SHapley Additive exPlanations; ECG: electrocardiogram; non-MR: no preprocedural mitral regurgitation; IVS: interventricular septum; AAO: ascending aorta; SOV: sinus of Valsalva; BAV: bicuspid aortic valve.
Comparison of models
We compared the Muex model with the neural network, LP, and MLKNN models in the metrics (Table 2). XGBoost selected 79 features for building four models. The results demonstrated that the proposed Muex model significantly outperformed the neural network, LP, and MLKNN models. In terms of discrimination, Muex reached a higher micro-average AUROC (Muex: 0.739; neural network: 0.705; LP: 0.504; and MLKNN: 0.514), AUROC for PVL (Muex: 0.798; neural network:0.789; LP: 0.482; and MLKNN: 0.539), and AUROC for CDs (Muex: 0.675; neural network: 0.603; LP: 0.520; and MLKNN: 0.499) than the other three models. The Muex model also exhibited the highest micro-average, as well as the greatest specificity for both PVL and CDs. Additionally, the neural network model performed best in micro-average and CDs sensitivity. In terms of calibration, Muex was better calibrated than neural network, LP, and MLKNN models with smaller ICI and E50 (Muex: ICI = 0.012, E50 = 0.005; neural network: ICI = 0.116, E50 = 0.628; LP: ICI = 0.046, E50 = 0.731; and MLKNN: ICI = 0.051, E50 = 0.500). As detailed in Multimedia Appendix Table S2 and Figure S4, the independent DNN models yielded AUROCs of 0.644 for CDs and 0.674 for PVL, respectively.
Performance of the four models on the testing set.
ACC: accuracy; AUC: area under the curve; LP: label powerests; MLKNN: multi-label k-nearest neighbor; CDs: conduction disturbances; PVL: paravalvular leakage.
Compared with the Muex model on the testing set.
Micro-average results of composite endpoint.
Discussion
This study developed a risk prediction assessment DNN model for PVL and new-onset CDs in patients immediately following TAVR from April 2012 to July 2023. The key findings included (1) For the first time, we have successfully utilized the Muex DNN model to integrate multi-source clinical data, enabling effective multi-label prediction of PVL and CDs post-TAVR, which exhibit a complex seesaw-like pattern. (2) We demonstrated that the Muex model incorporates novel algorithmic refinements to accommodate outcomes with potentially intricate correlations, yielding enhanced predictive accuracy and outperforming other multi-label modeling strategies, including neural network models and traditional MLC methods (LP and MLKNN). (3) In preliminary investigations, the Muex model demonstrated robustness for diverse variable types and methodologies for handling random missing values. To our knowledge, this is the first predictive model to simultaneously predict PVL and new-onset CDs immediately post-TAVR using DNN, and it is poised to significantly advance personalized decision-making support in the early post-TAVR phase, ultimately improving long-term patient outcomes.
PVL and CDs remain critical issues following TAVR, negatively impacting mid- and long-term prognosis in prior studies.6,20,21,33 However, these two complications in the previous studies do not share the same risk factors. In PVL, male sex, 12 non-diabetic status, 5 anatomical features such as a BAV, a larger virtual raphe ring perimeter, LVOT eccentricity, calcification in the annulus, leaflets, and LVOT,17–20 and procedural features including intentional supra-annular positioning of the bioprosthetic valve, self-expanding valves, valve undersizing17,19,20,23 are associated with an increased risk. In CDs, male sex, age, baseline conduction defects (e.g. LBBB and prolonged QRS), coronary artery bypass grafting history, chronic lung disease, and the need for home oxygen,3,13–16 anatomical features comprising membranous septal length, larger aortic annulus size and valve area, the ratio of the prosthesis to LVOT diameter, left ventricular end-diastolic diameter,15,21,22 procedural features such as implantation depth on the septal side, over-expansion of the native aortic annulus, self-expanding valves, and valve oversizing10,16,23 are important predictors. Additionally, the amount and the distribution of device landing zone calcium are important factors for both the degree of AR and the risk of PPMI post-TAVR.12,13 A previous study developed a predictive model for PVL integrating anatomical and procedural variables with the CoreValve prosthesis, demonstrating improved PVL prediction (sensitivity = 68.7% and specificity = 88.1%). 34 Kiani's team developed the Emory risk score, incorporating factors including a clinical history of syncope, right bundle branch block, prolonged QRS, and valve oversizing, which demonstrated a strong association with PPMI (AUROC = 0.778). 35 Tsushima's team utilized a larger patient sample (n = 1390) and 14 ML-based classifiers to predict PPMI post-TAVR, achieving the highest AUROC of 0.82. 36 Existing studies have mostly used independent modeling methods to predict the risks of PVL and CDs post-TAVR, without a risk model simultaneously predicting both. This will constrain the one-stop clinical evaluation of valvular interventional procedures and introduce potential biases in the formulation of procedural strategies. Specifically, while a dedicated single-label classifier may yield marginally superior performance for individual outcomes, it neglects their intrinsic interrelationships. Our multi-label framework drives the model to learn robust shared representations, thereby enhancing predictive consistency and generalization, which represents an invaluable advantage for trade-off decisions in real-world TAVR clinical practice.
Our study has advanced the field by developing a DNN model that simultaneously predicts both PVL and CDs post-TAVR for the first time, achieving a notable average AUROC of 0.739. Moreover, compared to previous studies, we collected a comprehensive set of 144 features, including preprocedural characteristics, CT, ECG, echocardiography, and procedural data, with 79 features selected for model development. The feature importance analysis of the Muex model indicates that baseline characteristics (e.g. body mass index, cerebrovascular disease, COPD, creatinine, history of atrial fibrillation, and history of percutaneous coronary intervention), CT parameters (e.g. LVOT diameter, AAO radius, SOV diameter, and right and left coronary heights), preprocedural echocardiographic features (e.g. functional bicuspid valve and non-MR), coronary protection, and preprocedural atrial fibrillation detected by ECG all demonstrate comparable feature importance in predicting both PVL and CDs, which is consistent with previous studies. Besides, calcification volume, no preprocedural PVL, aortic diameter, effective orifice area, AS type, age, sex, weight, and height exhibit significantly greater feature importance in predicting PVL. In contrast, for CDs, left ventricular posterior wall thickness, IVS thickness, previous myocardial infarction, pulmonary artery pressure, annular area, annular eccentricity, annular diameter, and BAV with fusion of right and left coronary leaflets demonstrate significantly greater feature importance. Notably, our SHAP analysis identified the Right Atrial dimension as highly influential in the multi-label task, notwithstanding its exclusion from the top 25 predictors in the single-label counterpart. This suggests the Muex model captures latent correlations between outcomes that task-specific algorithms fail to detect, providing a more holistic representation of risk. The shared and divergent feature importance underscores the complexity of the interplay between PVL and CDs. Despite the constrained dataset of patients, resulting in acceptable performance for prediction, these insights can still function as a critical reference for clinicians, guiding decision-making and achieving personalized medicine in multiple clinical settings.
ML-based multi-label models are now applied in imaging, ECG-based disease diagnosis, 37 adverse event prediction,38,39 and postprocedural complication prediction, 40 gaining increasing attention in the cardiovascular field in recent years. Based on problem transformation methods and algorithm adaptation, Jamthikar's team employed MLC models to predict three cardiovascular events, achieving an optimal AUC of 0.89. 41 Meng's team developed a multi-label DNN model to predict cardiovascular events and organ-specific outcomes in patients with hypertensive disorders of pregnancy (AUROC = 0.878; and average precision = 0.239). 42 Some studies suggest that the interdependencies among complications are crucial in predictive modeling, with their contributions potentially outweighing those of preprocedural factors.26,43 Thus, multi-label ML, accounting for interplay between complications, may aid clinicians in balancing patient risks during clinical decisions, such as TAVR valve selection and procedures. Muex DNN model, optimized with BCE Loss through an enhanced penalty weight mechanism, outperformed basic neural network and two MLC models by better aligning with the non-completely exclusive occurrence of the two complications. Comparative analysis with independent single-label DNNs (Supplemental Table S2 and Supplemental Figure S4) revealed sub-optimal performance, with AUROCs of only 0.644 (CDs) and 0.674 (PVL), and a marked sensitivity-specificity disparity in PVL prediction. By heavily penalizing the double-positive (1, 1) state, our weighted loss function compels the Muex model to predict simultaneous complications cautiously. This multi-label approach circumvents single-task overfitting by imposing joint constraints on shared features. Such a mechanism enhances the identification of critical variables, including implantation depth, pre-dilation, and post-dilation strategies, which simultaneously influence the balance of risk, thereby providing a globally optimized assessment for intra-procedural decision-making. Compared to MLKNN, which transforms multi-label problems into multiple binary classification tasks, and LP, which redefines multi-label problems as single-label multi-class classification, deep learning techniques excel in capturing the hierarchical structure and interdependencies between labels. 44 Besides, accurate prediction necessitates the integration of multi-source data, while biased feature selection can undermine model accuracy. 45 Also, striking a balance between utilizing information about other complications and minimizing the risk of attribute noise is crucial. 46 AI, especially DNN, offer a robust approach for integrating weak predictors, thereby significantly enhancing prediction accuracy. 47 Additionally, the design of simplified and discretized preprocedural variables may help enhance the model's generalization potential, enabling its transferability within pervasive healthcare systems. The clinical utility of this multi-label deep neural network model is underpinned by its inherent potential for seamless integration into established healthcare management systems, which will be crucial for realizing its translational impact. Initially, to facilitate rapid validation, the Muex model is targeted for integration into our proprietary Valvular Heart Disease (VHD) Intelligent Management Platform, an independently developed system for specialized cohort management. Already adopted across multiple centers, this kind of platform is poised to serve as a high-speed testbed to demonstrate the model's near-term scalability and clinical feasibility. Concurrently, electronic health record (EHR) embedded clinical decision support systems (CDSS) have shown high potential for multiple applications in healthcare and are becoming increasingly common. A primary advantage of CDSS is its capacity to promote clinician adherence to current practice guidelines. Studies demonstrate that CDSS is particularly effective in enhancing adherence to preoperative testing guidelines, resulting in significant cost savings. 48 For widespread and seamless clinical adoption, the model is designed to eventually function as an EHR-CDS tool. This integrated approach holds tremendous promise for fundamentally transforming the clinical workflow.
However, this study has limitations. First, as a single-center retrospective study, potential biases in patient selection and confounding by indication exist, warranting confirmation in multicenter, prospective studies. Besides, the progression of PVL and CDs post-TAVR, potentially driven by tissue edema and inflammation, poses a challenge for future risk prediction. 49 And biases regarding racial and socioeconomic disparities may persist on a global scale. Additionally, it remains uncertain whether the model's performance significantly decreases when applied to multi-source heterogeneous data of varying quality, such as echocardiograms and CT parameters, necessitating further investigation. We also plan to conduct prospective studies to validate the clinical utility of this model.
Conclusion
In conclusion, we developed and validated an interpretable DNN model called Muex that accurately predicted the risk of PVL and CDs post-TAVR. This DNN model outperformed the neural networks, LP, and MLKNN models in discrimination and calibration. Additionally, we provided a unique insight into how to use the relationship to assist prediction. The identification of high-risk patients and inference of the risk source for specific complications provided by the model has the potential to aid and improve complication management.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076261427502 - Supplemental material for Development of a multi-label deep neural network model for predicting immediate paravalvular leakage and new-onset conduction disturbances after transcatheter aortic valve replacement: A retrospective cohort study
Supplemental material, sj-docx-1-dhj-10.1177_20552076261427502 for Development of a multi-label deep neural network model for predicting immediate paravalvular leakage and new-onset conduction disturbances after transcatheter aortic valve replacement: A retrospective cohort study by Rui-si Tang, Yi-ming Li, Yun Bao, Jian-da Zeng, Yi-jun Yao, Jian-yong Wang, Xin-yue Yang, Yan-jiani Xu, Xing-zhou Pu, Wei-li Jiang, Yu-Heng Jia, Yue Yin, Hongde Li, Weiya Li, Zhang Yi and Mao Chen in DIGITAL HEALTH
Supplemental Material
sj-docx-2-dhj-10.1177_20552076261427502 - Supplemental material for Development of a multi-label deep neural network model for predicting immediate paravalvular leakage and new-onset conduction disturbances after transcatheter aortic valve replacement: A retrospective cohort study
Supplemental material, sj-docx-2-dhj-10.1177_20552076261427502 for Development of a multi-label deep neural network model for predicting immediate paravalvular leakage and new-onset conduction disturbances after transcatheter aortic valve replacement: A retrospective cohort study by Rui-si Tang, Yi-ming Li, Yun Bao, Jian-da Zeng, Yi-jun Yao, Jian-yong Wang, Xin-yue Yang, Yan-jiani Xu, Xing-zhou Pu, Wei-li Jiang, Yu-Heng Jia, Yue Yin, Hongde Li, Weiya Li, Zhang Yi and Mao Chen in DIGITAL HEALTH
Footnotes
Abbreviations
Acknowledgements
The authors thank all colleagues who supported the research and the TAVR patients and their families for their collaboration in this work.
Author contributions
Mao Chen and Zhang Yi designed this work. Rui-si Tang, Yi-ming Li, Yun Bao, Jian-da Zeng, and Yi-jun Yao developed the neural network architecture and performed most of the data analysis. Jian-da Zeng, Jian-yong Wang, and Wei-li Jiang help to complete the optimization and fine-tuning of the neural network. Rui-si Tang, Xin-yue Yang, Xing-zhou Pu, Yu-heng Jia, Yan-jiani Xu, Yue Yin, Hongde Li, and Weiya Li have done the patient organization and data collection. Rui-si Tang prepared all tables and figures. Rui-si Tang, Yi-ming Li, and Jian-da Zeng wrote the main manuscript. All authors reviewed the manuscript and approved the submitted version.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (U23A20395, 62306192, 82170375, and 62476185); 1.3.5 project for disciplines of excellence from West China Hospital of Sichuan University (ZYGD23021 and 23HXFH009); Natural Science Foundation of Sichuan Province, China (2023NSFSC1638 and 62476185); Chengdu Key Research and Development Support Program (2025-XT00-00014-GX).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The data and code underlying this article will be shared on reasonable request to the corresponding author.
Supplemental material
Supplemental material for this article is available online.
