Abstract
Objective
Premature Ventricular Contractions (PVC) is a common arrhythmia. Accurate localization is crucial for effective treatment and prognosis. Current Electrocardiography (ECG) methods face inherent limitations in localizing PVC precisely. This study aimed to develop a robust PVC localization model using radiomics and machine learning.
Methods
Data was collected from 304 PVC patients who underwent catheter radiofrequency ablation at the First Affiliated Hospital of Dalian Medical University between November 2015 and May 2023. Coronary Computed Tomography Angiography and clinical baseline data were used to extract 980 radiomic features. Least Absolute Shrinkage and Selection Operator regression identified the most valuable features. The dataset was divided into training and testing sets in a 7:3 ratio. Fifteen machine learning algorithms were used for model construction and evaluation, with SHapley Additive exPlanations analysis to assess feature importance. The results were compared with traditional ECG localization diagnostics and previously-studied articles.
Results
Gradient Boosting (GB), LightGBM, and Random Forest models performed well, with the area under the receiver operating characteristic curve (AUC) exceeding 0.8515, showing competitive performance compared to reported metrics of ECG-based methods. The GB model achieved an AUC of 0.9897 in distinguishing the left ventricular outflow tract from the right ventricular outflow tract. SHAP analysis revealed that radiomics features such as original_glszm_HighGrayLevelZoneEmphasis and clinical features such as B-type natriuretic peptide and left ventricular ejection fraction all emerged as important contributors to the predictive capacity.
Conclusion
Combining radiomics and machine learning techniques offers a robust, data-driven framework that complements traditional diagnostic approaches for PVC localization. This method enhances diagnostic precision and aids in developing personalized treatment plans for PVC patients.
Keywords
1. Background
Premature Ventricular Contractions (PVC) are a common arrhythmia in clinical practice, originating from abnormal excitability of ventricular myocardial cells, leading to premature depolarization and abnormal heart rhythms.1,2 PVC can cause symptoms such as palpitations and chest tightness, 3 and when they account for over 20% of the heart rate, they may lead to left ventricular enlargement and impaired cardiac function.4–6 Persistent PVC also increase the risk of lethal arrhythmias, such as atrial and ventricular fibrillation.7,8 While large trials have shown limited benefit from anti-arrhythmic medications, catheter ablation has proven highly effective in managing PVC. 9 Accurate localization of PVC origin before ablation is essential for assessing the condition, guiding treatment, and predicting outcomes.
Electrocardiogram (ECG) has been widely used for the localization diagnosis of PVC; however, there are specific sites of origin that have been found to be difficult to distinguish through this approach. Particularly, the ECG approach struggles when differentiating outflow tract PVC, namely between the left ventricular outflow tract (LVOT) and the right ventricular outflow tract (RVOT). The LVOT is further divided into the left coronary sinus (LCAS), right coronary sinus (RCAS), and the junction between the LCAS and RCAS (LCAS/RCAS junction). The RVOT includes the regions superior to the pulmonary valve leaflets (Superior to PV leaflets) and the regions inferior to the pulmonary valve leaflets (Inferior to PV leaflets), with the inferior to PV leaflets further divided into the free wall and septal. This difficulty arises because these two areas are closely adjacent in spatial location, and the ECG often presents with left bundle branch block patterns. Additionally, the transition zone in the precordial leads may overlap, making it difficult to accurately determine the origin of outflow tract PVC based on ECG findings alone. 10 Furthermore, the diagnostic process for PVC near LVOT and RVOT lacks standardization, with interpretation accuracy depending heavily on the operator’s experience and the quality of the ECG images. 11 While the ECG reflects electrical manifestations secondary to underlying structural abnormalities, subtle localized micro-architectural alterations (e.g., focal fibrosis or myocardial fiber disarray) associated with PVC foci may not produce distinctive signatures on a standard 12-lead ECG. Recent studies indicate that CCTA-based radiomics can non-invasively identify these microstructural changes by extracting high-dimensional features that correspond to myocardial heterogeneity and fibrosis, providing a more direct link to the underlying pathophysiology than surface electrical patterns.12,13
Recent advancements in artificial intelligence and machine learning have enhanced localization models for PVC.14,15 For example, He et al. trained four neural network models on ECG data from 249 PVC patients, achieving localization accuracy between 0.707 and 0.741. 10 Zheng et al. developed a model to differentiate PVC from LVOT and RVOT using 12-lead ECGs, with validation through ablation outcomes. 16 The mechanisms underlying PVC not only involve abnormal electrical activity in the heart but are also closely associated with structural changes in the heart.16–19 Coronary Computed Tomography Angiography (CCTA), as an important imaging modality, offers precise three-dimensional reconstruction of cardiac structures, providing more direct insights into the pathophysiological mechanisms of PVC.20,21 However, research combining CCTA with radiomics and machine learning techniques for PVC origin localization is currently limited, and further exploration of the relevant technical approaches and clinical applications is needed. 22
We developed a hierarchical machine learning framework integrating CCTA radiomics and clinical characteristics to progressively refine PVC localization from macroscopic to microscopic levels.
2. Materials and methods
The comprehensive methodological pipeline and research design of this study are systematically illustrated in Figure 1. Methodological pipeline for the CCTA-based radiomics model. The study workflow is categorized into four primary modules. (a) Data preparation: Comprises patient enrollment, multimodal data collection (clinical parameters and CCTA imaging), and ROI annotation of the cardiac structures. (b) Model construction: High-dimensional features extracted via Pyradiomics underwent rigorous selection to train 15 machine learning algorithms with automated hyperparameter tuning. (c) Model evaluation: The predictive performance of the optimized models was validated using ROC-AUC and comprehensive performance metrics. (d) Interpretability: SHAP analysis was implemented to provide global and local explanations, quantifying the contribution of specific radiomics and clinical predictors.
2.1. Patient selection
This study is a retrospective, single-center, diagnostic accuracy study designed to develop and validate a machine learning-based localization model for PVCs. Data from patients with PVC undergoing first-time radiofrequency catheter ablation (RFCA) were retrospectively collected at the First Affiliated Hospital of Dalian Medical University between November 2015 and May 2023. This study was conducted in accordance with the principles of the Declaration of Helsinki and received approval from the Ethics Committee of the First Affiliated Hospital of Dalian Medical University (Approval Number: PJ-KS-KY-2024-278). Due to the retrospective, non-interventional case-control nature of the investigation, the requirement for written informed consent was waived by the Institutional Review Board. All patient data were de-identified and used exclusively for research purposes.
Inclusion criteria: (1) PVC diagnosed by ECG and Holter monitoring; (2) CCTA within one month before RFCA; (3) successful ablation with documented PVC origin. Exclusion criteria: (1) prior RFCA; (2) PVC recurrence post-ablation; (3) missing CCTA, ECG, or clinical data. A total of 304 patients were enrolled in this study.
2.2. Data collection and preprocessing
2.2.1. Clinical data collection
Clinical data were manually extracted from the hospital’s Jiahe electronic medical records system. The collected data included patient information such as age, gender, height, weight, body mass index (BMI), and medical history, including hypertension and its grade, highest recorded blood pressure, diabetes history and blood glucose levels, hyperlipidemia and lipid profiles, and comorbidities like coronary artery disease (CAD), chronic heart failure (CHF), transient ischemic attacks (TIA), as well as B-type natriuretic peptide (BNP) levels. Data on thyroid function, including hyperthyroidism and Thyroid-Stimulating Hormone (TSH) levels, were also recorded, along with echocardiographic parameters. Additionally, data related to RFCA procedures were collected, including the origin sites of PVC, which were confirmed by intracardiac electrophysiological mapping during the ablation. All clinical data were exported in Excel format.
2.2.2. Imaging acquisition
The imaging equipment used in this study included second/third generation dual-source CT scanners (SOMATOM Definition Flash/Force, Siemens Healthcare, Germany). Five minutes before coronary CT angiography, patients received 0.25 mg of sublingual nitroglycerin. If the heart rate exceeded 70 beats per minute, metoprolol was administered orally. The scan range covered from the carina to the cardiac diaphragm. Detailed scanning parameters are provided in the Supplemental Material 1.
2.2.3. Data preprocessing
After acquiring high-quality cardiac imaging data, preprocessing was conducted to ensure model training accuracy and stability. The imaging data underwent reconstruction and pseudo-color processing to enhance detail and contrast of cardiac structures. To further improve clarity and uniformity, histogram equalization techniques were applied, enhancing both contrast and consistency in the data.
2.2.4. ROI annotation
After preprocessing, precise annotation of the entire heart was performed by two experienced radiologists using 3D Slicer software, a widely used open-source tool for accurate tissue segmentation in 3D space. 23 Each radiologist independently outlined the heart’s contours based on visible anatomical features. A third radiologist reviewed the annotations to ensure accuracy and consistency with the heart’s anatomy. Any discrepancies were discussed and adjusted until a consensus was reached. This thorough annotation process provides crucial anatomical data for machine learning training and aids in identifying potential PVC origin sites and pathological radiomic changes.
2.3. Feature extraction and selection
2.3.1. Radiomics feature extraction
Radiomic feature classification framework.
Note. GLRLM, Gray Level Run Length Matrix; GLSZM, Gray Level Size Zone Matrix; GLDM, Gray Level Difference Matrix; NGTDM, Neighboring Gray Tone Difference Matrix.
From each of the square, square root, logarithm, exponential, and gradient transformations, we extracted 69 features, composed of 18 first-order statistical features, 5 NGTDM features, 16 GLRLM features, 16 GLSZM features, and 14 GLDM features.
To ensure reproducibility, a standardized nomenclature (imageType_featureClass_featureName) was adopted, such as original_shape_Maximum2DDiameterRow. 24
2.3.2. Radiomics feature selection
To ensure model stability and mitigate the risk of overfitting, a four-step selection pipeline was implemented:
Step 1: Reproducibility Assessment. We utilized the Intraclass Correlation Coefficient (ICC) to evaluate feature reliability, retaining only those with high repeatability (ICC > 0.8). Step 2: Normalization. Features were standardized using Z-score normalization to eliminate scale-dependent bias, transforming each feature to a mean of 0 and standard deviation of 1. Step 3: Univariate Analysis. Statistical differences between groups were assessed using the t-test for each feature. Step 4: Dimensionality Reduction. Finally, the Least Absolute Shrinkage and Selection Operator (LASSO) regression with L1 regularization was applied. This step performed automatic feature shrinkage and selection to retain the most predictive predictors for final model construction.
2.4. Construction of machine learning localization model
In constructing the machine learning model for PVC localization, we employed various algorithms, including Random Forest (RF), Multilayer Perceptron (MLP), Gradient Boosting (GB), k-Nearest Neighbors (KNN), Decision Tree, AdaBoost, Extra Trees (ET), Linear Discriminant Analysis (LDA), LightGBM, Logistic Regression (LR), Gaussian Naive Bayes (GNB), Bernoulli Naive Bayes (BNB), Quadratic Discriminant Analysis (QDA), Bagging, and CalibratedClassifierCV. These algorithms were implemented using the sklearn library. By integrating multiple algorithms, we leveraged the strengths of each to compare and select the optimal model for PVC localization.
2.5. Statistical analysis
After constructing the machine learning models, we evaluated their performance using several metrics: classification accuracy (ACC), recall, precision (Prec), and AUC. These metrics provided a comprehensive assessment of model performance in localization tasks. ACC measured overall classification accuracy, recall assessed the model’s ability to identify positive instances, precision reflected the accuracy of positive predictions, and AUC summarized performance across various thresholds, highlighting discriminative ability. Additionally, we plotted Receiver Operating Characteristic (ROC) curves, calibration curves, and decision curve analysis (DCA) curves. The ROC curve evaluated sensitivity and specificity at different thresholds, with AUC indicating discriminative power. The calibration curve assessed the alignment between predicted and actual probabilities, and the DCA curve analyzed clinical utility across decision thresholds.
For statistical analysis, categorical variables were compared using the chi-square test or Fisher’s exact test, and continuous variables were evaluated with the t-test or Mann-Whitney U test. All tests were two-sided, with significance set at P<0.05. Given the multiple comparisons performed on baseline characteristics, all P-values were adjusted for the False Discovery Rate (FDR) using the Benjamini-Hochberg procedure. An FDR-adjusted P-value (Q-value) < 0.05 was considered statistically significant.
3. Results
3.1. Experimental setup
The PVC localization task in this study involved three progressively detailed subtasks. Task 1 was a binary classification to distinguish between LVOT-origin PVC (n = 84) and RVOT-origin PVC (n = 220). Task 2 further classified LVOT-origin PVC into three categories: LCAS (n = 33), RCAS (n = 23), and the LCAS/RCAS junction (n = 28). Task 3 focused on classifying RVOT-origin PVC into three anatomical subregions: Superior to PV leaflets (n = 28), free wall (n = 19), and septal (n = 173). Each subtask aimed to refine the localization of PVC origins for more accurate and detailed diagnosis.
To ensure model robustness and generalizability, the dataset was randomly partitioned at the patient level into a stratified 7:3 split for model development and independent testing. Model hyperparameters were selected via grid search within the training split, where an internal validation subset was used to choose the best-performing parameter combination. After optimization, the model was retrained on the full training split and evaluated once on the strictly independent test set, which was not involved in any stage of model training or parameter tuning. This design ensured that the reported performance metrics and the single confusion matrix correspond to the fixed test cohort, providing an unbiased and clinically traceable estimate of generalizability.
Experiments were conducted on a high-performance computing system with an NVIDIA GeForce RTX 3070 GPU, Intel Core i9-10900K CPU, and 64GB of memory. The operating system was Windows 10, and the primary libraries used were Python 3.7, scikit-learn, Pandas, NumPy, SciPy, Pyradiomics, and Matplotlib. All experiments were executed under consistent hardware and software conditions to ensure reproducibility.
3.2. Statistical analysis of clinical baseline characteristics of patients
Baseline clinical characteristics of the patients with PVC.
Notes. Values represent by numbers and percentage (%) or variables±SD. PVC, premature ventricular contraction; BMI, body mass index; PSBP, peak systolic blood pressure; PDBP, peak diastolic blood pressure; BG, blood glucose; TC, total cholesterol; TG, triglycerides; HDL, high-density lipoprotein cholesterol; LDL, low-density lipoprotein cholesterol; BNP, B-type natriuretic peptide; CAD, coronary artery disease; CAA, coronary artery atherosclerosis; GD, graves’ disease; HT, hypothyroidism; SCHT, subclinical hypothyroidism; HTT, Hashimoto’s thyroiditis; TSH, thyroid stimulating hormone; RVID, right ventricular internal diameter; IVS, interventricular septal thickness; LVID, left ventricular internal diameter; LVEF, left ventricular ejection fraction; RVOTW, right ventricular outflow tract width; ARD, aortic root diameter; LAD, left atrial diameter; PAD, pulmonary artery diameter; CHF, chronic heart failure; TIA, transient ischemic attack. P-values were adjusted for multiple comparisons using the Benjamini-Hochberg procedure. FDR-adjusted P-value < 0.05 was considered significant.
3.3. Radiomic feature extraction and selection results
In the feature extraction phase, we used the Pyradiomics library to extract 980 radiomic features from the ROI regions of heart images. Through various image processing techniques, we obtained a rich set of feature data, with specific categories shown in Table 1 These data encompass multiple dimensions, including shape, texture, and intensity distribution. The extracted features enable us to capture key information regarding the geometry of the heart, tissue patterns, pixel intensity, and multi-scale texture and structure.
To select the most influential features for PVC localization, we employed a series of selection methods, including LASSO regression. LASSO regression introduced an L1 regularization term, which helped prevent overfitting while performing feature selection. The results of the LASSO selection are shown in Figure 2, with visualizations for tasks 1, 2, and 3 represented as 2A, 2B, and 2C, respectively. The LASSO regression path plots show the coefficients of features approaching zero as the Log(λ) value changes. The optimal λ value was determined by cross-validation, corresponding to the point with the minimum binomial deviance, marking the best feature selection. As the Log(λ) value increases, most feature coefficients decrease and approach zero, leaving only a few key features with non-zero coefficients. For localization tasks 1, 2, and 3, we selected 21, 6, and 68 key radiomic features, respectively. Flowchart of screening process for radiomic features. Of these, (a), (b), and (c) respectively represent the process of filtering features in positioning Tasks 1, 2, and 3, including changes in feature coefficients (left) and changes in binomial deviance (right) under different λ values.
3.4. Localization results combining machine learning with radiomics
The performance of all machine learning models across the three localization tasks was comprehensively evaluated using key metrics, including AUC, Acc, Prec, and Recall. Detailed results for all models are consolidated and presented in tabular format in Supplementary Material 2. The following sections summarize the top-performing models and their key determinants based on SHAP analysis.
3.4.1. Results of localization task 1
In localization Task 1 (differentiating between LVOT and RVOT), model performance varied across algorithms (Figure 3(a)). ET and GB achieved the highest AUCs (0.9903 and 0.9897), followed by LightGBM (0.9886) and RF (0.9746). AdaBoost, Bagging, and LDA yielded AUCs of 0.9670, 0.9709, and 0.9396, respectively, while KNN and MLP underperformed (AUCs 0.6675 and 0.5598). ROC curves for GB, LightGBM, and ET approached the ideal top-left corner, indicating high sensitivity and specificity. Modeling results for the three progressively detailed PVC localization subtasks. (a–c) ROC curves and AUC values of 15 machine learning algorithms for: (a) Task 1: LVOT vs. RVOT classification; (b) Task 2: LVOT sub-region classification (LCAS, RCAS, and LCAS/RCAS junction); (c) Task 3: RVOT anatomical subregion classification (Superior to PV leaflets, free wall, septal). (d–f) SHAP summary plots: Radiomic features (Top) and clinical features (Bottom) for (d) Task 1, (e) Task 2, (f) Task 3.
SHAP analysis identified key contributing features. Among radiomic features, original_shape_Maximum2DDiameterRow, original_shape_Sphericity, and exponential_gldm_GrayLevelNonUniformity were most impactful, with the first having the highest SHAP value. Clinically, BNP, LDL, and weight were the most influential predictors.
3.4.2. Results of localization task 2
In Task 2 (classifying LVOT sub-regions: LCAS, RCAS, and LCAS/RCAS junction), model performance is presented in Figure 3(b), while SHAP feature analysis is displayed in Figure 3(e). LightGBM and RF achieved the highest AUCs (0.9322 and 0.9174), followed by Bagging (0.9111) and GB (0.9165). BNB and GNB performed poorly (AUCs 0.4123 and 0.5751), while LR and MLP showed moderate performance (AUCs 0.7143 and 0.7521). ROC curves for LightGBM, RF, and Bagging closely approached the ideal top-left corner.
SHAP analysis revealed that square_glszm_SmallAreaEmphasis, exponential_firstorder_Entropy, wavelet-HHH_glrlm_RunVariance, and square_glszm_ZonePercentage were the most impactful imaging features. Among clinical features, blood pressure, PAD, blood glucose, weight, and BNP significantly influenced model decisions.
3.4.3. Results of localization task 3
In localization Task 3 (distinguishing RVOT as Superior to PV leaflets, free wall, or septal), machine learning models demonstrated varying performances. As shown in Figure 3(c), RF and LightGBM achieved the highest AUCs (0.8515 and 0.8450), followed by GB (0.8403) and Bagging (0.8080). MLP and QDA performed poorly (AUC 0.5000 each).
SHAP analysis identified original_glszm_HighGrayLevelZoneEmphasis, wavelet-LHL_ngtdm_Contrast, original_glszm_SmallAreaHighGrayLevelEmphasis, and wavelet-LLL_gldm_SmallDependenceHighGrayLevelEmphasis as the most impactful radiomic features. Clinically, LVEF, BNP, RVOTW, and blood pressure were key predictors.
Overall, these results suggest that models integrating radiomic and clinical features are highly effective in identifying specific origin sites within the RVOT, providing robust diagnostic support.
3.5. Comparison of radiomic methods and traditional ECG localization diagnostic methods
Comparisons of various radiomic-machine learning methods with traditional ECG detection by experienced doctors.
Note. Acc, Accuracy; Prec, Precision.
In Task 1 (differentiating between LVOT and RVOT), the radiomics approach achieved an Acc of 0.913, higher than Physician 1’s 0.761 and Physician 2’s 0.804, indicating its potential as a complementary tool in distinguishing origin sites. In Task 2 (distinguishing specific origin sites within LVOT), the radiomics approach had an Acc of 0.769, outperforming Physicians 1 and 2, whose accuracies were 0.615 and 0.654, respectively. In Task 3 (distinguishing specific origin sites within RVOT), the radiomics approach achieved an Acc of 0.879, while both physicians reached only 0.439, further emphasizing the potential of the radiomic method as a complementary tool in complex localization tasks.
Comparison of the results with previously-studied articles.
Note. Acc, Accuracy; SE, Sensitivity; AUC, Area Under the Curve; LVOT, Left Ventricular Outflow Tract; RVOT, Right Ventricular Outflow Tract; PVC, Premature Ventricular Contraction; CNN, Convolutional Neural Network; SVM, Support Vector Machine; RF, Random Forest; GBDT, Gradient Boosting Decision Tree; GNB, Gaussian Naive Bayes.
Overall, the radiomic approach, which integrates high-dimensional imaging features with machine learning, demonstrated improved accuracy of PVC localization, particularly as task complexity increased.
4. Discussion
4.1. Evaluation of technical performance and algorithm selection
This study developed a robust model for PVC localization using machine learning and radiomics. We extracted high-dimensional features via Pyradiomics and applied LASSO regression for feature selection. Fifteen machine learning algorithms were trained for three tasks: distinguishing LVOT from RVOT, classifying LVOT sub-regions, and classifying RVOT sub-regions. SHAP analysis revealed key radiomic and clinical features influencing predictions.
The three subtasks were designed based on contemporary electrophysiology frameworks. Task 1 addresses the common LVOT/RVOT distinction. Task 2 adopts the LCAS, RCAS, and LCAS/RCAS junction classification proposed by Nakasone et al. 25 and Sabzwari et al. 28 For RVOT, we divided it into three regions: superior to PV leaflets, free wall, and septal, maintaining anatomical continuity. However, PVC origins may also arise from other sites such as the aortomitral continuity and parahisian region, 11 indicating the need for future framework expansion.
For the binary classification (LVOT vs. RVOT), the best-performing models achieved an AUC of 0.99, accuracy of 0.913, precision of 0.778, and recall of 0.913. In LVOT sub-region classification, the AUC exceeded 0.91, while accuracy (0.769) and precision (0.663) indicated greater challenges in fine-grained differentiation. For RVOT sub-region classification, AUC exceeded 0.85 and accuracy was 0.879, but precision dropped to 0.505 despite a recall of 0.855, reflecting the impact of class imbalance. Clinically, false-positive localization—e.g., misclassifying a septal origin as free wall—could lead to unnecessary catheter manipulation, prolonged mapping, and ineffective ablation. Thus, improving precision in rare sub-regions remains a key priority, warranting future efforts such as data augmentation or cost-sensitive learning.
Performance varied across algorithms. Tree-based ensemble models (GB, LightGBM, RF) demonstrated the highest predictive power due to their ability to capture complex, non-linear interactions within high-dimensional radiomic data. While their “black-box” nature limits interpretability, we mitigated this via SHAP analysis. In contrast, MLP and KNN performed poorly—MLP is highly sensitive to hyperparameters, and KNN suffers from the “curse of dimensionality” in high-dimensional feature spaces.
4.2. Myocardial heterogeneity and pathophysiological relevance
Key radiomic features, such as original_glszm_HighGrayLevelZoneEmphasis, wavelet-LHL_ngtdm_Contrast, and original_glszm_SmallAreaHighGrayLevelEmphasis, captured subtle myocardial changes associated with PVC origins. PVC is often linked to myocardial heterogeneity and fibrosis, 29 and the model’s integration of high-resolution imaging features enabled more precise localization.
BNP and LVEF emerged as top clinical predictors, consistent with their established roles as markers of cardiac stress and PVC burden.30–35 Their high SHAP values underscore the physiological relevance of the model, linking anatomical radiomics with functional impairment.
4.3. Clinical utility and EP workflow integration
Our PVC localization model demonstrates robust predictive capacity, showing high consistency with manual expert interpretation and yielding competitive results relative to literature-reported ECG models. While existing ECG models typically report AUC values between 0.88 and 0.96, 25 our framework achieved a higher AUC of 0.99 and maintained robust precision in sub-regional classification. Our model’s ability to provide an objective, quantitative assessment effectively mitigates this subjectivity. This suggests potential for enhanced detection of subtle pathological changes, providing complementary diagnostic information to ECG. Our method can guide and facilitate more precise ablation procedures, potentially contributing to improved treatment outcomes.
Integrating the CCTA-based radiomics model into the clinical electrophysiology (EP) workflow offers a structured approach to optimizing PVC ablation (Figure 4). In a typical real-world scenario, patients undergo a standard pre-procedural CCTA to assess cardiac anatomy. Our model serves as an offline analysis engine during the planning phase, requiring approximately 10 minutes of processing time without additional radiation or patient cost. This ‘pre-map’ informs critical decision points before the patient enters the EP lab: (1) Strategic Planning, by determining the optimal vascular access (e.g., retrograde aortic for LVOT vs. venous for RVOT); (2) Tool Optimization, by selecting specialized catheters for deep or intramural origins; and (3) Targeted Mapping, by directing the initial catheter placement toward the predicted origin site. This workflow reduces intra-cardiac ‘hunting’ time, thereby minimizing fluoroscopy exposure and total procedural duration. Furthermore, identifying origins near high-risk structures like coronary ostia enables more precise patient counseling and enhances procedural safety. However, it is important to emphasize that these workflow advantages remain theoretical at this stage, as their clinical benefit—including reductions in procedure time, radiation exposure, or complication rates—has not yet been demonstrated in prospective studies. The proposed pipeline should therefore be viewed as a hypothesis-generating framework requiring future clinical validation. Graphical abstract of the study. (a) Standard workflow: Empirical mapping with stepwise exploration and prolonged “hunting” time. (b) AI-guided radiomics workflow: Integration of CCTA-based radiomics and machine learning for pre-procedural PVC origin prediction (offline, ∼10 min), enabling targeted mapping, optimized vascular access, and reduced procedural uncertainty. CCTA, coronary computed tomography angiography; LVOT, left ventricular outflow tract; PVC, premature ventricular contraction; RVOT, right ventricular outflow tract.
In clinical practice, PVCs encompass a wide spectrum from incidentally detected benign ectopy to symptomatic or cardiomyopathy-inducing arrhythmias requiring intervention. The proposed CCTA-based radiomics model is specifically intended for the latter subgroup—patients with symptomatic or high-burden PVCs (e.g., burden >10% or those with PVC-induced cardiomyopathy) who are being considered for catheter ablation and who already undergo clinically indicated CCTA as part of pre-procedural anatomical assessment. Conversely, the model is not designed for patients with incidentally detected, low-burden, asymptomatic PVCs who are managed conservatively. From a workflow perspective, the model is applied offline during the pre-procedural planning phase: after CCTA acquisition and before the patient enters the electrophysiology laboratory, with a processing time of approximately 10 minutes. Thus, the ideal candidate is one with an intermediate-to-high likelihood of undergoing ablation, in whom CCTA is already obtained for coronary or anatomical evaluation.
5. Limitations
Several limitations should be noted. First, the retrospective, single-center nature of this study may introduce selection bias and limit generalizability. The lack of an external validation cohort remains a significant constraint, primarily due to the substantial heterogeneity in CCTA protocols across centers; thus, this work serves as a proof-of-concept study. Second, while a strictly independent 7:3 test set was used to simulate clinical deployment, performance stability across alternative resampling schemes has not been demonstrated due to the absence of nested cross-validation. Given the very high AUC values reported (approaching 0.99), the risk of over-optimistic performance estimation cannot be fully excluded, even though we used a fixed held-out test set and reported comprehensive metrics such as precision and recall to ensure transparency. Future multi-center prospective research incorporating MRI-derived radiomics, data-balancing techniques, and nested cross-validation is warranted to further enhance the biological relevance and universal robustness of this framework.
6. Conclusion
In this study, we developed a hierarchical machine learning framework integrating CCTA-based radiomics and clinical features for PVC localization. The Gradient Boosting, LightGBM, and Random Forest models achieved AUCs of 0.9897 for distinguishing LVOT from RVOT, and 0.8515–0.9322 for sub-regional classification. SHAP analysis identified key radiomic features (e.g., original_glszm_HighGrayLevelZoneEmphasis) and clinical predictors (BNP, LVEF) as major contributors. Compared with experienced ECG readers, our approach achieved higher accuracy in this cohort (e.g., 0.879 vs. 0.439 for RVOT sub-regions). However, limitations include retrospective single-center design, lack of external validation, and class imbalance. Despite these constraints, the proposed method offers a complementary framework to ECG, with potential to improve pre-procedural planning. Prospective validation is needed before clinical adoption.
Supplemental material
Supplemental material - Combining radiomics and machine learning for enhanced localization of premature ventricular contractions
Supplemental material for Combining radiomics and machine learning for enhanced localization of premature ventricular contractions by Jingjie Liu, Shiyu Dai, Lingxuan Hou, Boyang Zang, Yang Liu, Chongfu Jia, Xiaomeng Yin in Digital Health.
Supplemental material
Supplemental material - Combining radiomics and machine learning for enhanced localization of premature ventricular contractions
Supplemental material for Combining radiomics and machine learning for enhanced localization of premature ventricular contractions by Jingjie Liu, Shiyu Dai, Lingxuan Hou, Boyang Zang, Yang Liu, Chongfu Jia, Xiaomeng Yin in Digital Health.
Supplemental material
Supplemental material - Combining radiomics and machine learning for enhanced localization of premature ventricular contractions
Supplemental material for Combining radiomics and machine learning for enhanced localization of premature ventricular contractions by Jingjie Liu, Shiyu Dai, Lingxuan Hou, Boyang Zang, Yang Liu, Chongfu Jia, Xiaomeng Yin in Digital Health.
Footnotes
Ethical considerations
The institutional ethics committee of Dalian Medical University (DMU) approved this study. (Approval Number: PJ-KS-KY-2024-278).
Consent to participate
The requirement for informed consent was waived because of the retrospective study design and the use of images and clinical information about patients derived from medical records.
Author contributions
YXM designed the study, LJJ and DSY collected the data, HLX and ZBY analyzed the data, LJJ and LY made major contributions to the existing literature search and interpretation of the manu script and drafted the manuscript; YXM and JCF were involved in the interpretation of the data and made revisions to the initial draft. All authors read and approved the fnal manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Liaoning Revitalization Talents Program: Medical Experts Project (Number YXMJ-LJ-12).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data sets generated and/or analysed during the current study are avail-able upon request from the corresponding authors.
Use of AI tools in the writing process
The authors did not use generative AI or AI-assisted technologies in the conception, analysis, or writing of this manuscript.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
