Abstract
Background
Transcatheter aortic valve replacement (TAVR) has broadened treatment for high-risk, inoperable patients with aortic stenosis, though existing risk models poorly predict its futility.
Objective
This study aimed to identify significant predictors of 6-month TAVR outcomes using interpretable machine learning to enhance risk stratification for TAVR futility.
Methods
We used a multicenter dataset of 213 patients (80.8 ± 5.8 years, 43.9% female). Two random forest-based machine learning models were developed to predict midterm outcomes: a primary outcome of cardiovascular death or heart failure readmission, and a secondary outcome that included cardiac functional and quality of life metrics. Model performance was evaluated using repeated 5-fold cross-validation, and top features were interpreted with SHapley Additive exPlanations.
Results
The random forest model outperformed traditional risk scores and a baseline logistic regression model for both outcomes (P < .001), with AUCs of 0.791 (0.766-0.816) and 0.784 (0.764-0.804) for the primary and expanded outcomes, respectively. The analysis identified a panel of key predictors that included traditional risk scores alongside a set of novel markers. Lower low-density lipoprotein cholesterol, lower high-density lipoprotein cholesterol, and high carotid-femoral pulse wave velocity were predictive for both outcomes. Lower estimated glomerular filtration rate and low serum creatine phosphokinase were unique predictors for the primary and expanded outcomes, respectively.
Conclusion
Our findings highlight the need to shift TAVR risk prediction from a one-size-fits-all model to a personalized framework, offering a new lens for patient assessment and opportunities for targeted interventions to reduce futility and improve outcomes.
This is a visual representation of the abstract.
Key Points
What is Known?
Traditional risk models are poorly predictive of transcatheter aortic valve replacement (TAVR) futility, failing to identify high-risk patients that will not benefit from the procedure. Factors contributed to TAVR futility are not fully understood.
What Does this Study Add?
Our interpretable machine learning model identified a panel of novel predictors for poor midterm TAVR outcomes, significantly outperforming traditional risk scores. Frailty and aortic stiffness markers were strong predictors of TAVR futility. Our model identified markers of frailty (lower low-density lipoprotein cholesterol [LDL-c], low high-density lipoprotein cholesterol, low serum creatine phosphokinase) and aortic stiffness (high pulse wave velocity) as key risk factors. A “cholesterol paradox” was observed in this high-risk population, where lower LDL-c was strongly predictive of poor outcomes. Even mild-to-moderate renal impairment (low estimated glomerular filtration rate) was identified as a critical risk factor for TAVR population.
Introduction
Transcatheter aortic valve replacement (TAVR) has revolutionized the treatment of severe aortic stenosis (AS). 1 The procedure has expanded the therapeutic window to a more advanced age and comorbid individuals who were previously deemed to be inoperable. However, this expansion has also brought a critical and growing challenge of TAVR futility. 2 This concept refers to the subset of patients who, despite successfully undergoing TAVR, soon experience no meaningful improvement in quality of life or fail to change their poor prognostic trajectory. 2 Nearly 50% in the PARTNER trial receive no benefit, and 30% of prohibitive risk patients died within a year. 2 Therefore, as TAVR volume increases with an aging population, accurately identifying patients who will not benefit has become a clinical imperative to avoid overtreatment. This further highlights the limitations of traditional risk scores like Society of Thoracic Surgeons score (STS) and EuroScore II in predicting mid- and long-term outcomes and identifying TAVR futility. 2
To capture a more holistic view of post-TAVR success and futility, we employed an interpretable machine learning approach using a random forest model. We aim to enhance risk stratification and uncover predictors that contribute to poor midterm (6 months) outcomes by including a broad spectrum of patient data. Our study used 2 distinct endpoints. The primary endpoint assessed major clinical events: cardiovascular (CV) death or heart failure (HF) readmission. A second, comprehensive endpoint was then constructed by combining the primary outcomes with metrics on cardiac functional status and quality of life. We utilized an interpretability technique namely SHapley Additive exPlanations (SHAP) to explain the clinical significance of the most influential features.
Methods
Study Population
The study included a total of 213 patients who underwent TAVR using an Edwards valve. Patient data were retrospectively collected from Hospital Universitario Marqués de Valdecilla (IDIVAL) in Santander, Spain (between 2019 and 2021). The data was de-identified and anonymized and ethical approval was granted by the Clinical Research Ethics Boards of the institutions. Data collection and clinical measurements were performed by operators blinded to the objectives and contents of this study. All clinical measurements and imaging protocols were performed in accordance with the American College of Cardiology and American Heart Association guidelines.3,4 Features were comprised of a comprehensive set of patient characteristics and pre-TAVR clinical metrics, including demographics, comorbidities, laboratory values, echocardiographic, functional and patient-reported assessments, as well as data measured by SphygmoCor (see Table 1 for detailed characteristics).
Patients Baseline Characteristics, Grouped by Primary Endpoint and Expanded Endpoint Post-TAVR.
Abbreviations: TAVR, transcatheter aortic valve replacement; NYHA, New York Heart Association.
Outcome Definitions
Two endpoints were carefully defined: 6-month primary composite endpoint and 6-month expanded composite endpoint (see below sections for detailed definitions). The final study cohort comprised 189 patients after applying specific inclusion and exclusion criteria based on our outcome definitions to ensure a focus on midterm outcomes (see Figure 1). The detailed patient characteristics are presented in Table 1.

Patient selection and outcome stratification flowchart. (a) Study design for 6-month primary endpoint. (b) Study design for 6-month expanded endpoint included metrics of cardiac function and quality of life.
Six-Month Primary Composite Endpoint: CV Death or HF Readmission
The primary endpoint was a composite of CV death and HF readmission within 6 months following TAVR (Figure 1). Patients with CV death events from 30 to 200 days and HF readmission events within 200 days were included. Deaths occurring within the initial 30 days were excluded (N = 5), as these could result from procedural complications. Outcomes extending beyond 200 days were censored, and patients lacking follow-up data between 160 and 200 days were excluded due to unknown midterm outcomes (N = 19).
Six-Month Expanded Composite Endpoint: Lack of TAVR Benefit
To better capture the midterm post-TAVR cardiac recovery status to identify patient who may not benefit from TAVR, we defined the 6-month composite outcome representing lack of TAVR benefit (Figure 1). A poor outcome was defined as experiencing the primary endpoint (CV death or HF readmission within 6 months) or cardiac functional decline at the 6-month follow-up. Following the definition of prior research, cardiac functional decline was defined as any of the following at 6-month follow-up: New York Heart Association (NYHA) Class III or IV, a Kansas City Cardiomyopathy Questionnaire (KCCQ) score <45 points, or a KCCQ score decline >10 points from baseline. 5
Data Preprocessing
Prior to feature selection and model training, the dataset underwent several preprocessing steps to ensure data quality and stability. Missing data were addressed: our cohort had overall low missingness (90% of features had <3% missingness), only 1 feature was removed due to very high missingness (“aortic ring diameter,” missingness >78%). Missing values in categorical features were treated as a new category. For numerical features, median imputation was performed during each cross-validation training fold. This choice of simple median imputation was made because for predictive modeling purposes, simple imputation methods often yield comparable prediction performance to complex methods, 6 and median is less impacted by the extreme clinical values. Given the low overall missingness and the high-dimensional small sample size in our cohort, the marginal gain of more complex imputation method is likely to be minimal.7,8
To mitigate multicollinearity, highly correlated predictors were handled by calculating a pairwise Pearson correlation matrix. When the absolute correlation coefficient between 2 features exceeded a threshold of 0.8, only the feature with greater clinical significance was retained based on domain knowledge. Finally, features with minimal variance were removed to prevent potential model instability.
Model Development and Evaluation
Model Selection and Rationale
The predictive model was developed using a random forest algorithm, implemented in Python 3 with the scikit-learn and imblearn libraries. A BalancedRandomForestClassifier was used to address the imbalance in the outcome classes. Random forest was chosen for its suitability for clinical data, as it is relatively insensitive to extreme values and does not require data normalization, which is beneficial for preserving the integrity of clinically significant outliers. 9 The hyperparameters for the random forest model were tuned using a grid search approach.
We also preliminarily explored another popular tree-based algorithm, eXtreme Gradient Boosting (XGBoost), but eventually discarded due to its known tendency to overfit on small datasets. Its iterative process of fitting residuals can cause the model to memorize random noise instead of the underlying signal. 10 This characteristic was also observed in our preliminary experiments. Given our limited sample size, random forest were prioritized to ensure a more robust and generalizable model.
SHAP analysis was employed to enhance model interpretability and support the discovery of novel predictors. SHAP is one of the most widely used methods in modern explainable artificial intelligence since its introduction in 2017. 11 It explains a model's output by assigning a unique contribution value to each feature for every individual prediction. We use SHAP to assess overall feature importance as well as how each predictor influenced patients’ outcome. Insights from SHAP, along with random forest's native feature ranking, were instrumental in the initial feature selection process.
For more details, refer to the Supplemental materials.
Results
Patients’ Baseline Characteristics
Six-Month Primary Composite Endpoint: CV Death or HF Readmission
Ultimately, 189 patients were included in the study cohort. Of these, 22 experienced the poor primary outcome, while the remaining 167 had an acceptable outcome.
Table 1 summarizes the baseline characteristics, grouped by patient's outcome of primary endpoint. Demographics like age and sex were similar across groups. However, patients with a poor outcome showed significantly higher arterial stiffness (pulse wave velocity [PWV] P = .015; aortic distensibility P = .039), reduced functional capacity (6-min walk test P = .03, more NYHA Class III), and elevated surgical risk (STS morbidity P = .009; STS mortality P = .028). This group also had more atrial fibrillation (P = .01), hypertension (P = .03), and worse renal function (estimated glomerular filtration rate [eGFR] P = .013). Differences were also noted for low-density lipoprotein cholesterol (LDL-c) (P = .019) and creatine phosphokinase (CK) (P = .041). The table provides full details on all clinical variables.
Six-Month Expanded Composite Endpoint: Lack of TAVR Benefit
After applying expanded endpoint definition, 28 out of the 189 patients were classified as having a poor outcome. Table 1 outlines baseline characteristics by expanded composite endpoint. Similar to the primary endpoint, patients with poor expanded composite outcomes showed increased arterial stiffness, worse functional capacity (NYHA class), and higher predicted surgical risk. They also showed more atrial fibrillation, worse renal function, and lower CK and LDL-c. This poor outcome group also exhibited a higher prevalence of previous stroke (P = .04), while baseline hypertension did not significantly differ between groups.
Among the 28 patients who met the expanded events, none in this group experienced early stroke, transient ischemic attack, or myocardial infarction. A total of 10 patients (36%) experienced 1 or more short-term complications, most commonly need for permanent pacemaker implantation (N = 8), bleeding (N = 3), and less frequently, renal (N = 2) or vascular (N = 1) complications. See Supplemental Table 2 for details of the complications for those 10 patients.
Feature Selection Outcomes
Initial 10 Features Included Risk Scores and Markers of Cholesterol Levels, Muscle Damage, Cardiac, and Renal Function
Random forest and SHAP analysis identified the top 10 most important features for each endpoint from over 80 variables (see Figures 2 and 3). These top predictors showed substantial overlap across both endpoints, comprising key blood biomarkers, CV parameters, and traditional risk scores (STS morbidity, STS mortality, and EuroScore II).

Top 10 feature importance ranking for primary endpoint, sorted by its overall importance from top to bottom. Left: ranking by random forest; right: ranking by SHAP feature importance. Abbreviation: SHAP, SHapley Additive exPlanations.

Top 10 feature importance ranking for expanded endpoint, sorted by its overall importance from top to bottom. Left: ranking by random forest; right: ranking by SHAP feature importance. Abbreviation: SHAP, SHapley Additive exPlanations.
Among these, blood biomarkers LDL-c, high-density lipoprotein cholesterol (HDL-c), eGFR, and CK (may reflect cholesterol level, renal functions, and muscle damage) were identified as highly important predictors for both outcomes. CK stood out as the most critical feature for the expanded endpoint. For CV parameters, left ventricular diastolic diameter and peak transvalvular velocity were also identified among the top features, with PWV notably exhibiting high predictive importance across both endpoints. Detailed feature importance rankings for both endpoints and algorithms are presented in the figures.
Model Performance
The Final Models Used an Established Score Plus 4 Selected Features; LDL, HDL, and PWV Were Consistently Identified as Important Predictors for Both Endpoints
Following an exhaustive feature selection process, which involved the evaluation of all possible feature combinations up to 5 features as described in the Methodology section, distinct yet overlapping sets of optimal predictors were identified for each outcome model. The final features selected for the 6-month primary outcome and 6-month composite outcome models are summarized in Figure 4.

SHAP beeswarm plot showing the overall feature importance and impact on model outcomes. Each dot is an individual patient's data point. Each row represents a feature, sorted by its overall importance from top to bottom. Its horizontal position shows the feature's impact on the prediction (SHAP value), and its color represents the feature's actual value (blue=low, red=high). Features are ranked by importance from top to bottom. (a) Primary outcome. (b) Expanded outcome. Abbreviation: SHAP, SHapley Additive exPlanations.
Features for both models included an established STS score, as it encodes significant clinical information. HDL-c, LDL-c, and PWV were selected across both models, underscoring their robust predictive importance for post-TAVR outcomes. Importantly, eGFR was specifically selected for the primary outcome model, while CK was selected for the expanded outcome model, which incorporated metrics of cardiac functional recovery.
Random Forest Model Outperforms Linear Model and Traditional Scores
The developed random forest models demonstrated superior predictive performance compared to a baseline linear model (L1-regularized logistic regression) and existing traditional risk scores (STS scores and EuroScore II) for both primary and expanded outcomes. Tables 2 and 3 summarize the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and Area Under the Precision-Recall Curve (AUC-PR) for all the evaluated models across 10 repetitions of 5-fold cross-validation, noting that the confidence intervals (CIs) for STS scores and EuroScore II were evaluated using a bootstrap method.
Model Performance Comparison for Primary Endpoint.
*P < .001 for random forest compared to all other models.
Model Performance Comparison for Expanded Endpoint.
*P < .001 for random forest compared to all other models.
Specifically, the random forest model achieved a mean AUC-ROC of 0.791 (95% CI: 0.766, 0.816) for predicting the primary outcome and 0.784 (95% CI: 0.764, 0.804) for the expanded outcome.
SHAP Plot Analysis Results from Random Forest Models
Global Feature Contributions Across Outcomes
To interpret the contribution of individual features to the random forest classification model's predictions, SHAP values were computed for each feature across all out-of-fold data points to quantify feature contribution to the predicted probability, relative to the dataset's baseline output.
Figure 4a and b presents the SHAP beeswarm plots, summarizing global feature importance and the distribution of their impacts on the model's output for the primary and expanded outcomes, respectively. In both figures, the color scale indicates the original feature value for each data point (red for higher, blue for lower), while the horizontal axis represents the SHAP value (positive values increase predicted risk, negative values decrease it).
Across both outcomes, LDL-c, STS scores, HDL-c, and PWV showed consistent patterns of influence. Counterintuitively, for LDL-c, higher values exhibited a protective pattern, similar to HDL-c, associating with negative SHAP values and thereby reducing predicted risk. Conversely, higher STS scores and PWV increased predicted risk.
Apart from the shared predictors, CK and eGFR demonstrated distinct, outcome-specific contributions to their respective models. For the primary outcome model, eGFR played a particular role, where higher values tended to reduce predicted risk. CK was notably prominent in the expanded outcome model, with higher values substantially increasing predicted risk. This highlights CK as a critical indicator when the outcome definition includes metrics of cardiac functional recovery.
Low LDL-c and HDL-c Contributed to Poor Outcomes
Further SHAP dependence plots revealed nonlinear relationships for LDL-c and HDL-c lipids, with both demonstrating a protective pattern where higher values reduced predicted risk and lower levels contributed to increased risk (Figures 5 and 6).

SHAP dependency plots for the 5 selected predictors of primary outcomes: each plot shows how the SHAP value for a specific feature varies with its own value. The SHAP value quantifies the impact of a feature value on the model's prediction, with a SHAP value >0 indicating an increase in the predicted outcome (higher risk) and a SHAP value <0 indicating a decrease (lower risk). The shaded yellow band represents the range where the SHAP value is approximately zero, indicating the feature's value has a neutral effect on the prediction. (a) Low-density lipoprotein cholesterol, (b) STS predicted risk of morbidity, (c) high-density lipoprotein cholesterol, (d) renal glomerular filtration rate, and (e) pulse wave velocity. Abbreviation: SHAP, SHapley Additive exPlanations.

SHAP dependency plots for the 5 selected predictors of expanded outcome. The interpretation of the plots and SHAP values is described in detail in Figure 1's caption. (a) Low-density lipoprotein cholesterol, (b) creatine phosphokinase, (c) STS predicted risk of mortality, (d) high-density lipoprotein cholesterol, and (e) pulse wave velocity. Abbreviation: SHAP, SHapley Additive exPlanations.
For LDL-c (Figures 5 and 6), concentrations above approximately 75 to 77 mg/dL lowered predicted risk, while those below 65 to 69 mg/dL led to an increased risk, a pattern consistent across both primary and expanded outcome models. Similarly, HDL-c values exceeding 41 to 42 mg/dL were associated with a reduced risk, whereas levels dropping below 36 mg/dL elevated predicted risk.
PWV >12 m/s Contributed to Poor Outcomes
PWV also exhibited a consistent pattern across both primary and expanded outcomes. Higher PWV values contributed to an increased predicted risk, while lower values were associated with a decreased risk. Approximate thresholds for this influence were observed between 10.8 and 12.2 m/s.
Specifically, increased predicted risk was observed for PWV values above approximately 12.2 m/s in the primary outcome (Figure 5), and above 12.0 m/s for the expanded outcome (Figure 6). Conversely, reduced risk was associated with values below 11.2 m/s for the primary outcome and 10.8 m/s for the expanded outcome, respectively. These findings demonstrate that PWV values exceed approximately 12 m/s are associated with an increased predicted risk across both outcomes, directly contributing to poor outcomes.
eGFR <54 mL/min/1.73 m2 Contributed to Increased Risk
eGFR was uniquely identified as a predictor for the primary outcome model; it revealed a protective relationship with predicted risk. Higher eGFR values reduced predicted risk, while lower values contributed to an increased risk.
Detailed SHAP dependence plot analysis (Figure 5) showed that eGFR values above approximately 65 mL/min/1.73 m2 were associated with a reduced predicted risk. Values within the approximate range of 54 to 65 mL/min/1.73 m2 showed minimal contribution to the predicted risk (SHAP values approximately zero). However, for eGFR values falling below approximately 54 mL/min/1.73 m2, a sharp increase in positive SHAP values was observed, indicating a substantial contribution to an increased predicted risk. Overall, these findings highlight that eGFR values lower than approximately 54 mL/min/1.73 m2 are strongly associated with an increased predicted risk, contributing to primary poor outcome.
Low Serum CK was a Significant Predictor When the Endpoint Metrics Included Cardiac Functional Decline
For the expanded outcome, where endpoint metrics included cardiac functional decline, CK was identified as a significant predictor. Figure 6 revealed a clear shift in influence across its concentration levels. A small range of CK values between 62 and 70 U/L was observed when its general contribution to the direction of model output across folds was minimum. However, the change in its impact on the model output close to this range was sharp. A protective effect was observed at higher CK values above a threshold of 70 U/L, leading to a reduced risk of poor outcome. When the CK value was <62 U/L, it contributes to adverse outcome.
Discussion
Adding Frailty Markers Improves the Model Performance
Our analysis showed that operative risk scores STS and EuroScore II had limited utility in predicting adverse midterm outcomes in our TAVR cohort. Previous studies attribute this to their primary design for surgical aortic valve replacement (SAVR) operative mortality12,13 and their omission of frailty assessment. 14
Frailty is a prevalent and recognized strong predictor of poor prognosis in TAVR population, 15 presenting up to 80% of patients. 16 Although guidelines emphasize its importance for risk stratification,17,18 and combining conventional risk scores with a frailty index has been shown to improve prediction accuracy, 19 frailty remains a complex, multidimensional syndrome that lacks a consensus-based assessment method. This highlights the need for objective, quantifiable biomarkers to better capture its physical, cognitive, and nutritional components. 20
In line with this approach, our machine learning model, using SHAP analysis, identified low HDL-c, low LDL-c, and low serum CK as significant predictors of poor midterm TAVR outcomes. The predictive power of these serum markers aligns with the understanding that they may reflect underlying states of malnutrition, chronic inflammation (LDL-c and HDL-c), and sarcopenia or metabolic frailty (CK), all of which are key hallmarks of the frailty syndrome. The specific clinical interpretation of our findings of frailty-related markers will be discussed in the following sections.
Interestingly, although albumin is a frequently recognized biomarker for frailty, 15 our interpretable machine learning model did not identify albumin (3.8 ± 0.4 g/dL) as a significant predictor in our cohort. This suggests that a single biomarker is not sufficient to capture the multifaceted nature of frailty.
Low HDL Levels May Serve as an Important Indicator of Systemic Inflammation in TAVR Patients
Our study found that a low baseline HDL-c was a significant predictor of both defined poor post-TAVR outcomes, with patients having levels below 36 to 42 mg/dL exhibiting an increased risk. Prior research has also reported this association in TAVR patients, where lower HDL-c levels (with cut-off values around 46 mg/dL) is an independent predictor of worse short- and long-term TAVR outcomes.21,22
HDL-c is well-recognized for its CV protective effects, including antioxidant properties, 23 reverse cholesterol transport, 24 and anti-inflammatory actions against atherosclerosis. 23 Since AS pathogenesis involves similar processes of oxidative stress and lipid infiltration as atherosclerosis, 25 functional HDL-c likely offers comparable protective benefits against AS initiation and progression. 26
However, when HDL-c becomes dysfunctional, it loses its protective capacities and instead actively contributing to systemic inflammation.27,28 Low HDL-c directly marks this systemic inflammatory burden, a characteristic seen in various conditions common in the elderly, such as diabetes and chronic kidney disease (CKD).27,29 This persistent chronic inflammation is a recognized driver of the frailty syndrome. Therefore, our finding that low baseline HDL-c predicts poor outcomes in TAVR patients suggests it may serve as an important, accessible indicator of underlying systemic inflammation and compromised physiological reserve.
The Cholesterol Paradox: Lower LDL-c Likely Reflects Frailty and Malnutrition, Contributing to a Higher Risk of Poor Post-TAVR Midterm Outcomes
Our study observed a cholesterol paradox: low LDL-c levels predicted poor post-TAVR outcomes. SHAP analysis identified LDL-c values below 69 to 75 mg/dL (primary outcomes) and 65 to 77 mg/dL (expanded outcomes) as contributing to adverse events after TAVR. This finding is particularly notable given that low LDL-c is generally considered beneficial for CV health, with guidelines often recommending levels below 70 mg/dL for individuals with atherosclerosis. 30 Paradoxically, in our cohort, higher LDL-c showed a protective effect, while lower levels contributed to poorer outcomes.
This inverse association between lower LDL-c and worse prognosis, often termed “reverse epidemiology,” has been observed in other populations with chronic conditions, including patients with chronic HF,31,32 advanced coronary artery disease (CAD), 33 and even in the general older population (age >60). 34 Proposed mechanisms for this paradox frequently include malnutrition and chronic inflammation, 32 suggesting low LDL-c serves as a potential marker of underlying disease severity or a compromised physiological state. For instance, a large study in CAD patients specifically linked this cholesterol paradox to malnutrition, emphasizing the need for targeted nutritional screening in those with low LDL-c. 33
Moreover, low LDL-c is increasingly recognized as a valuable component in frailty assessment. A study of 1170 elderly Chinese patients with coronary heart disease identified LDL-c as a top frailty risk predictor. 35 Similarly, a large UK Biobank study involving over 200 000 participants revealed that frail individuals exhibited significantly lower concentrations of total cholesterol, LDL-c, and HDL. 36 Even in younger adults, higher LDL-c correlates with lower frailty scores, positioning it as a key nutrition-related parameter reflecting overall physiological reserve. 37
Despite limited large observational studies on this specific paradox in TAVR patients, its prevalence in common TAVR comorbidities (eg, chronic HF, CAD) highlights its relevance. Previous smaller TAVR cohort studies support our findings, reporting similar patterns. One study identified an LDL-c threshold of 71 mg/dL for predicting poor 30-day TAVR outcomes, 22 and another study on 2090 TAVR patients found baseline LDL-c below 55 mg/dL associated with a higher long-term mortality. 38 Our identified thresholds (65-77 mg/dL) add clinical significance of low LDL-c in this high-risk population.
Collectively, these observations strongly suggest that low baseline LDL-c in TAVR patients serves as an important marker reflecting underlying malnutrition and frailty status. Incorporating LDL-c into comprehensive frailty assessments for TAVR candidates could enhance preprocedural risk stratification.
Baseline Serum CK May be a Particular Strong Predictor for Post-TAVR Cardiac Function Recovery
Our machine learning model identified low CK levels (below 62-70 U/L) as a significant predictor of TAVR patient prognosis when outcomes included metrics reflecting cardiac functional status (NYHA class and KCCQ scores).
CK is an enzyme predominantly found in the heart and skeletal muscle that plays a crucial role in cellular energy metabolism. While high CK levels may signal myocardial damage, 39 reduced CK levels are increasingly recognized as an indicator of underlying energy impairment, muscle wasting, 40 and sarcopenia,41,42 reflecting broader metabolic dysfunction and frailty status. Notably, sarcopenia itself is a strong independent predictor of short- and long-term survival in TAVR patients. 43 Low CK levels are prevalent in various chronic comorbidities that are also prevalent in TAVR patients (eg, frailty, 41 CKD 44 ), often linked to chronic inflammation.41,42
Although direct studies on baseline CK levels in TAVR patients are limited, reduced CK activity has been observed in patients with severe AS and reduced ejection fraction, suggesting its potential contribution to the progression toward systolic failure. 45 This aligns with our findings, proposing a link between low baseline serum CK and the impaired cardiac energy capacity for post-TAVR functional recovery.
While reported predictive CK thresholds in the literature vary widely (eg, 44-120 IU/L),40–42,44 our model's specific thresholds of below 62 to 70 U/L fall within this clinically relevant range, adding value in TAVR patients. Therefore, low baseline CK levels could serve as an accessible and surrogate marker for sarcopenia or underlying cardiac metabolic frailty in TAVR patients, providing valuable insights into a patient's capacity for postprocedural cardiac functional recovery.
Patient With Low Baseline LDL, Low HDL, and Low Serum CK Profile May Need Significant Personalized Care and TAVR Risk Management
The concurrent presence of low baseline HDL-c, low LDL-c, and low serum CK may suggests a compounded state of physiological vulnerability and reflect intertwined systemic inflammation, malnutrition, and muscle wasting, all critical components of the frailty syndrome.
Patients undergoing TAVR are significantly older (in our cohort, the mean age was 80.8 years) and with multiple comorbidities. Since chronological age doesn't always reflect a patient's true physiological reserve, frailty assessment has become a crucial surrogate for biological age, indicating a patient's capability to recover from stress. 46 Identifying these frail patients allows clinicians to provide extra care, including screening their nutritional and inflammatory status. This can then guide active, targeted treatments to improve their overall physiological state, potentially leading to better TAVR outcomes. Indeed, multiple studies and trials are underway to actively treat frailty syndrome, highlighting a significant shift of paradigm in the TAVR risk management. 20
Our findings suggest that patients with low HDL-c, LDL-c, and serum CK pre-TAVR profile signals frailty and may require enhanced, personalized risk stratification. This insight helps integrate these readily available biomarkers into comprehensive frailty assessments and improve TAVR outcomes in these high-risk individuals.
Baseline Aortic Stiffness Measured by PWV has Incremental Prognostic Value for TAVR Patients’ Risk Stratification
Aortic stiffness can be noninvasively measured by carotid-femoral pulse wave velocity (cf-PWV), which offers a significant prognostic value for TAVR patient risk stratification. Higher cf-PWV indicates stiffer central arteries, and is a known independent predictor of CV events across various patient groups.47–52 Our model identified higher cf-PWV as a key predictor for both primary (11.2-12.2 m/s) and expanded (10.8-12 m/s) poor TAVR outcomes, highlighting its substantial prognostic utility within our cohort.
cf-PWV is widely recognized as the gold standard for assessing central aortic stiffness. 53 It measures how fast the blood pressure pulse travels through the major arteries. A stiff aorta increases the pulsatile load on the left ventricle, elevating afterload and contributing to adverse left ventricular (LV) remodeling. 54
Higher cf-PWV values have been linked to poorer quality of life after aortic valve replacement. 55 The prognostic value of cf-PWV has also been demonstrated in TAVR patients. A study showed that patients with cf-PWV ≥11.01 m/s predicted worse 1-year survival rate post-TAVR, and was an independent predictor on multivariate analysis. 56 Our model's identified thresholds (10.8-12.2 m/s) align closely with these established predictive cut-offs, further highlighting the critical impact of aortic stiffness on TAVR outcomes.
The persistence of high baseline aortic stiffness, as measured by cf-PWV, likely indicates pre-existing left ventricular damage or maladaptive remodeling that may not fully reverse even after the valvular obstruction is relieved by TAVR. This persistent hemodynamic burden leads to suboptimal postprocedural recovery. While specific cf-PWV remodeling studies in TAVR patients are limited, brachial-ankle PWV, another arterial stiffness measurement, has been linked to slower reverse LV remodeling, 57 support that elevated arterial stiffness impairs cardiac recovery.
Even Mild-to-Moderate Renal Impairment (eGFR <60) Could be a Critical Comorbidity Marker in TAVR Patients
Our machine learning model identified eGFR <54 to 65 mL/min/1.73 m2 as a predictor of primary outcomes following TAVR, a finding that aligns with the established role of renal dysfunction as a key comorbidity, presented up to 72% in this population, 58 and associated with worse TAVR outcomes. 59
While CKD traditionally defined as an eGFR <60 mL/min/1.73 m2 in adults, 60 this cut-off is under debating for elderly population. Critics argue that because eGFR naturally declines with age, this universal threshold may over-diagnose CKD in the elderly, proposing an age-adapted threshold of 45 mL/min/1.73 m2 for those over the age of 65.60,61
Our analysis provides a crucial, data-driven contribution to this debate. SHAP analysis identified an eGFR threshold range of 54 to 65 mL/min/1.73 m2, below this threshold contribute to the prediction of adverse outcomes. This finding is notably because it suggests that TAVR patients with eGFR levels well above the age-adjusted threshold of 45 are already in a high-risk zone.
This implies a particular vulnerability within the TAVR cohort, where even mild reductions in renal function that was considered as part of the normal aging process are prognostically significant.
Limitation
The primary limitation of this study is the relatively small and retrospective sample size; therefore, the results should be considered exploratory and hypothesis-generating, contributing to the understanding of TAVR patient management. Furthermore, our approach to handling missing data relied on simple median imputation. This was a pragmatic decision guided by the objective of predictive modeling and the low overall missingness of the cohort. 6 Although literature suggests simple imputation methods are sufficient for predictive modeling,6,62 we acknowledge that this technique does not fully account for population variance. Future research, particularly those focusing on statistical inference, could explore and compare alternative imputation methods within larger datasets to determine how these statistical choices might impact predictive performance and variance estimation in this clinical setting. 62
A second limitation concerns potential patient selection bias and model transportability related to the study's cohort. We acknowledge that our model's performance may subject to inherent biases from regional healthcare system structures and national practice guidelines. Specifically, referral center bias and national guideline bias may have influenced the patient cohort risk profile. For example, different healthcare system structures, such as public (Canada) versus mixed private–public models (Spain) affect patient waiting times and their clinical risk profile at intervention. Similarly, region-specific guidelines (eg, European vs American Heart Association and the American College of Cardiology) may directly influence patient inclusion and when the procedure occurs. Moreover, differences in local population demographics, including race and socioeconomic status, introduce additional bias. These factors make the current model's performance in different regions or healthcare environments uncertain. Future work should focus on rigorous external validation, preferably with larger and diverse populations from different regions, and we call for decision-makers and stakeholders to collaborate and enhance data availability in this field for future research. Future studies should incorporate fluid‑dynamics and solid‑mechanics metrics that correlate with valvular disease and TAVR outcomes into the analysis63–76.
Supplemental Material
sj-docx-1-hvs-10.1177_30494826251413753 - Supplemental material for Interpretable Machine Learning Uncovers Novel Predictors of Transcatheter Aortic Valve Replacement Futility and Midterm Outcomes
Supplemental material, sj-docx-1-hvs-10.1177_30494826251413753 for Interpretable Machine Learning Uncovers Novel Predictors of Transcatheter Aortic Valve Replacement Futility and Midterm Outcomes by Yueqing Sun, Jose M de la Torre Hernandez, Nima Maftoon, Gabriela Veiga Fernandez, Fermin Sainz Laso, Dae-Hyun Lee, Tamara Garcia Camarero, Cristina Ruisanchez, Victor Fradejas, Mercedes Benito, Aritz Gil Ongay, Celia Garilleti, Sergio Barrera and Zahra Keshavarz-Motamed in Journal of the Heart Valve Society
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada (grant number RGPIN-2025-06401).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
