Abstract
Objective
To evaluate the diagnostic performance of a combined model incorporating ultrasound video-based radiomics features and clinical variables for distinguishing between benign and malignant breast lesions.
Methods
A total of 346 patients (173 benign and 173 malignant) were retrospectively enrolled. Breast ultrasound videos were acquired and processed using semi-automatic segmentation in 3D Slicer. Radiomics features were extracted from volumetric tumor regions and refined using feature selection methods. Models were constructed using clinical variables, radiomics features, and their combination. Model performance was evaluated using receiver operating characteristic (ROC) analysis and area under the curve (AUC) values.
Results
The clinical model incorporating age, tumor size, and Breast Imaging Reporting and Data System (BI-RADS) classification achieved an AUC of 0.873. The radiomics model, utilizing 14 selected features, attained an AUC of 0.836. The combined model, integrating radiomics and clinical data, demonstrated significantly improved predictive performance with an AUC of 0.926, surpassing the BI-RADS-based model (AUC = 0.737). Internal validation using bootstrap resampling confirmed the robustness of the combined model (AUC = 0.901-0.954).
Conclusion
The integration of ultrasound video-based radiomics with clinical characteristics significantly improves the differentiation of benign and malignant breast tumors compared to conventional BI-RADS evaluation. This approach may enhance diagnostic accuracy and facilitate more precise clinical decision-making.
Introduction
Breast cancer is one of the most prevalent malignancies among women and has increasingly been diagnosed at younger ages. 1 It has become the leading cause of cancer-related deaths in women worldwide. Early detection and timely intervention are critical for improving survival rates and enhancing the quality of life for breast cancer patients.2,3
Traditional diagnostic methods, such as mammography, ultrasound, and magnetic resonance imaging, play a crucial role in detecting breast cancer and assessing its characteristics.4-6 Among these, ultrasound is frequently employed as a preoperative imaging modality due to its non-invasive nature, real-time imaging capability, ease of use, and cost-effectiveness. 7 However, it has several limitations, including operator dependence and reduced sensitivity for atypical breast cancer and small breast tumors.8,9 In recent years, radiomics has emerged as a promising tool in medical imaging, enabling the extraction of quantitative features not visible to the naked eye.10,11 By analyzing large datasets, it can reveal subtle patterns indicative of tumor characteristics, such as heterogeneity and biological activity.
Specifically, in the context of breast lesions, radiomics has shown promise in differentiating benign from malignant lesions,12,13 assessing tumor aggressiveness, 14 and predicting patient outcomes. 15 Although previous ultrasound-based radiomics studies, primarily focused on static two-dimensional (2D) images of breast lesions, have been extensively explored, they may fail to fully capture the complexity and heterogeneity of tumors. This is because a single 2D ultrasound plane may not adequately represent the entire tumor, leading to incomplete or inaccurate assessments.16,17
Unlike static 2D ultrasound images, dynamic ultrasound videos provide continuous imaging data, allowing for a more comprehensive evaluation of the tumor.18-20 These videos capture multiple planes of the tumor over time, providing a more detailed insight into the characteristics of the tumors, potentially revealing additional information about its internal structure.20,21 As a result, dynamic ultrasound videos may provide more accurate and reliable insights compared to static images alone.
Although radiomics applied to static ultrasound images has shown promise in breast cancer diagnosis, dynamic ultrasound videos may provide a more comprehensive and precise assessment. We hypothesize that this dynamic ultrasound radiomics approach could ultimately enhance the differentiation between benign and malignant breast lesions, thereby facilitating earlier and more accurate diagnoses. Therefore, we aim to evaluate the potential of radiomics based on dynamic ultrasound videos for distinguishing between benign and malignant breast lesions.
Methods
Statement of Ethics
This study was approved by the Ethics Review Committee of Dongyang People's Hospital (Approval No. 2025-YX-170) and conducted in accordance with the Declaration of Helsinki. As a retrospective study, the requirement for informed consent was waived by the Ethics Review Committee of Dongyang People's Hospital.
Patients
In this retrospective study, 173 breast cancer patients who underwent surgery between September 2021 and April 2024 were consecutively enrolled, and an equal number of patients with benign breast lesions from the same period were randomly selected as controls. Patients were included if they met the following criteria: 1) they underwent routine ultrasound examinations followed by breast lesion surgery at our institution; 2) their breast lesions were confirmed by surgical pathology; and 3) they possessed complete clinicopathological information. Conversely, patients were excluded if they met any of the following criteria: 1) they had two or more surgically treated breast lesions; 2) they had evidence of distant metastasis; 3) the ultrasound video quality was insufficient for radiomics analysis; or 4) the pathological findings were incomplete. Figure 1 shows the patient selection process. The reporting of this study conforms to STROBE guidelines (https://www.equator-network.org/reporting-guidelines/strobe/). We also have de-identified all the patient details.

Flowchart of Patient Enrollment and Exclusion in the Present Study.
Collection of Ultrasound Videos
The breast ultrasound examinations were performed by experienced sonographers using a LOGIQ E9 (GE Healthcare, Chicago, IL) and a Siemens Acuson S2000 (Siemens Healthineers, Erlangen, Germany) with a high-frequency linear probe to ensure thorough scanning of the breast. Before performing the ultrasound scan, the instrument settings such as gain, depth, and focus were adjusted to ensure that the breast lesion was centered on the screen and clearly visible. The characteristics of the lesions were meticulously documented according to established clinical criteria, focusing on three key features: location (breast side and clock-face position), maximum diameter in the longitudinal plane, and classification based on the Breast Imaging Reporting and Data System (BI-RADS) categories. 22 To capture the entire nodule in video format, the scan was conducted along the long axis of the lesion with uniform scanning for 5–10 s, depending on the lesion size. The video was then stored in Digital Imaging and Communications in Medicine (DICOM) format, an internationally recognized standard for the storage, management, and exchange of medical images and related data.
Segmentation of Ultrasound Videos
In this study, a semi-automatic segmentation approach was employed to delineate tumor regions in breast ultrasound videos using 3D Slicer (version 5.6.1). The ultrasound videos, stored in DICOM format, were first imported into 3D Slicer for processing. Tumor regions were manually delineated by a sonographer with over five years of experience in breast ultrasound. To optimize efficiency, fewer frames were annotated in regions where the tumor exhibited a regular shape, whereas more frames were marked in areas with irregular morphology. The ‘Fill Between Slices’ tool was then used to interpolate the segmentation to the remaining frames, inferring the lesion shape from neighboring labeled slices to ensure a smooth transition. This semi-automatic method effectively reduced manual annotation effort while preserving segmentation accuracy. The final result was a three-dimensional volumetric structure of the tumor, represented as a volume of interest (VOI), which allowed for comprehensive analysis and visualization.
Radiomics Feature Extraction
Radiomics feature extraction was performed using the PyRadiomics module (version 3.1.0), which was installed as an external package in 3D Slicer. The following parameters were applied for feature extraction: resampled pixel spacing was set to 1.0, 1.0, 1.0, and the bin width was set to 25. The extracted features included first-order statistics, shape, and texture features such as Gray Level Co-occurrence Matrix (GLCM), Gray Level Dependence Matrix (GLDM), Neighborhood Gray-Tone Difference Matrix (NGTDM), Gray Level Run Length Matrix (GLRLM), and Gray Level Size Zone Matrix (GLSZM). All texture feature types were extracted with Laplacian of Gaussian (LoG) transformations at sigma values of 2.0, 3.0, 4.0, and 5.0. Additionally, wavelet transformation was applied during feature extraction to capture multi-scale texture information. In total, 1223 ultrasound radiomics features were extracted across all categories and transformations.
Radiomics Feature Selection
Before selecting radiomics features, all features underwent Z-score normalization. To reduce the risk of overfitting and improve feature relevance, dimensionality reduction techniques were applied. The intraclass correlation coefficient (ICC) was first used to assess the intraobserver and interobserver reproducibility of the radiomics features. Features with ICC values greater than 0.8 were deemed reliable and selected for the next step in feature selection. The Mann-Whitney U test was then applied, retaining features with a p-value less than 0.05. Finally, the least absolute shrinkage and selection operator (LASSO) regression, using 10-fold cross-validation, was employed for feature selection. Each non-zero coefficient was multiplied by its corresponding feature value, and the sum of these products constituted the radiomics score (Radscore).
Model Construction and Test
We first established the Radscore model based on the calculated Radscore. Concurrently, clinical factors including tumor location, patient age, and maximum lesion were initially subjected to univariate analysis. Variables demonstrating statistical significance were then incorporated into a multivariate analysis to develop a clinical model. Additionally, based on the BI-RADS classification, tumors categorized as 3–4A were designated as benign, while those classified as 4B–5 were considered malignant, thereby formulating the BI-RADS model. Subsequently, we integrated clinical features, BI-RADS classification, and Radscore to construct a combined model. This combined model underwent internal validation using bootstrap resampling to assess its stability, and a nomogram was created for visualization purposes. All models were evaluated based on sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV). Receiver operating characteristic (ROC) curves were generated, and the area under the curve (AUC) values were calculated to determine the diagnostic performance of each model. Figure 2 shows the workflow of the study.

The Workflow of the Study, Including Image Delineation, 3D Reconstruction, Feature Extraction and Selection, Model Construction, and Internal Validation.
Statistical Analysis
Statistical analysis was conducted using R software (version 4.1.2). The normality of the data was assessed using the Kolmogorov-Smirnov test. For normally distributed continuous variables, the independent two-sample t-test was applied, while the Mann-Whitney U test was used for non-normally distributed continuous variables. Categorical variables were compared using the χ² test or Fisher's exact test, as appropriate. A P-value of less than 0.05 was considered statistically significant.
Results
Baseline Characteristics
A total of 346 female patients with breast tumors were included in this study, with 173 cases classified as benign and 173 as malignant. Table 1 provides an overview of the baseline characteristics of all patients, including age, tumor size, location, and BI-RADS classification. The pathological subtypes of the breast tumors are detailed in Table 2. On ultrasound, malignant tumors tended to be larger than benign ones, and patients diagnosed with malignant lesions were significantly older than those with benign tumors. Moreover, the BI-RADS classification exhibited a statistically significant difference between benign and malignant cases.
Baseline Characteristics of the Included Patients.
BI-RADS, breast imaging reporting and data system.
The Pathological Types of Breast Tumors.
Clinical Model Construction
We performed a univariate analysis of tumor location, age, tumor size, and BI-RADS classification. The results showed that age, tumor size, and BI-RADS classification were statistically significant. Further multivariate analysis confirmed that age, tumor size, and BI-RADS classification remained statistically significant, as shown in Table 3. Based on these variables, a clinical model was constructed using logistic regression, achieving an AUC of 0.873.
Univariate and Multivariate Analyses of Clinical Information.
OR, odds ratio; CI, confidence interval; BI-RADS, breast imaging-reporting and data system.
Radiomics Feature Selection and Model Construction
A total of 1223 radiomics features were extracted from each breast tumor video. First, 104 features with an ICC less than 0.8 were excluded. Next, 566 features with a p-value greater than 0.05 in the Mann-Whitney U test were eliminated. Further feature selection using LASSO regression retained 14 features with nonzero coefficients. Figure 3 illustrates the use of LASSO regression for feature selection and regularization. The feature coefficients are shown in Figure 4A and detailed in Table 4. Further analysis revealed that each feature showed a statistically significant difference between benign and malignant lesions, with a p-value < 0.001 (Figure 4B). Additionally, the correlation coefficients between features were all below 0.9 (Figure 4C). Cluster analysis demonstrated that these features can effectively distinguish between benign and malignant breast lesions (Figure 4D).

LASSO Regression for Feature Selection and Regularization. (A) Ten-fold Cross-Validation Curve Showing the Relationship Between the Mean Cross-Validation Area Under the Curve and Log-transformed lambda (Log(λ)). The Dashed Green Line Indicates lambda.min, Which Yields the Minimum Cross-Validation Error, Selecting 53 Non-zero Coefficients. The Dashed Orange Line Marks lambda.1se, Representing the Most Regularized Model Within One Standard Error of the Minimum, Selecting 14 Non-zero Coefficients. (B) LASSO Coefficient Profiles of the Selected Features as a Function of Log-transformed Lambda (Log(λ)). Each Line Represents the Path of an Individual Feature's Coefficient as the Penalty Increases. The Solid Gray Horizontal Line Represents a Coefficient Value of Zero. The Dashed Vertical Lines Correspond to lambda.min and lambda.1se, Consistent With Panel A. LASSO, Least Absolute Shrinkage and Selection Operator.

Visualization and Analysis of the Selected 14 Radiomics Features. (A) Bar Plot of Feature Coefficients. (B) Violin Plots of Feature Distributions Between Benign and Malignant Lesions. Each Feature's Distribution is Compared Between Benign (Blue) and Malignant (Orange) Groups. Asterisks (***) Indicate p < 0.001. (C) Correlation Heatmap of the 14 Radiomics Features. Pearson Correlation Coefficients Between Pairs of Features are Visualized, with Red and Blue Representing Positive and Negative Correlations, Respectively. (D) K-means Clustering Visualization Using Principal Component Analysis (PCA). PCA was Performed on the Standardized Feature Set, and the First Two Principal Components (PC1 and PC2) are Shown. The Data Points are Colored by True Label (Benign vs Malignant) and Shaped by their K-Means Cluster Assignment.
Radiomics Features Selected and Corresponding Coefficients.
A Radscore was calculated for each patient based on the selected feature coefficients and corresponding feature values. A logistic regression model was then constructed using the Radscore, achieving an AUC of 0.836 for predicting breast cancer.
Combined Model Construction
A combined model was developed based on Radscore, age, lesion size, and BI-RADS to predict breast cancer, achieving an AUC of 0.926. Figure 5 illustrates the ROC curves of all the constructed models. Internal validation using the Bootstrap method yielded an AUC ranging from 0.901 to 0.954 (Figure 6). To enhance interpretability, a nomogram was constructed to visualize the combined model (Figure 7). Additionally, a predictive model based solely on BI-RADS classification was built, where BI-RADS 3–4A was classified as benign and BI-RADS 4B–5 as malignant, achieving an AUC of 0.737. The performance metrics of the four constructed models are displayed in Table 5, while the pairwise DeLong test results for AUC comparisons are presented in Table 6. The combined model demonstrated significantly superior predictive performance compared to the BI-RADS-based model alone, indicating that integrating BI-RADS classification with clinical and radiomics features substantially improves breast cancer prediction.

Receiver Operating Characteristic Curves of the Four Constructed Models.

Bootstrap Distribution of the AUC Values from 1000 Resampling Iterations. AUC, Area Under the Curve.

Nomogram for Predicting Malignant Risk Based on the Features: Radscore, Age, Tumor Size, and BI-RADS. BI-RADS, Breast Imaging Reporting and Data System; Radscore, radiomics score.
Performance of Four Models in Predicting Breast Cancer.
SEN, sensitivity; SPE, specificity; ACC, accuracy; TN, true negative; TP, true positive; FN, false negative; FP, false positive; NPV, negative predictive value; PPV, positive predictive value; BI-RADS, breast imaging-reporting and data system; AUC, area under the curve; Radscore, radiomics score.
Comparison of AUCs Between the Four Models Using DeLong's Test.
BI-RADS, breast imaging-reporting and data system; AUC, area under the curve; Radscore, radiomics score.
Discussion
In this study, we aimed to enhance the accuracy of breast cancer diagnosis by integrating clinical factors and radiomics features. We analyzed 346 female patients with breast tumors, equally divided between benign and malignant cases. Our results indicated that malignant tumors were significantly larger and occurred in older patients compared to benign lesions, with the BI-RADS classification distinguishing between these groups. A clinical model based on age, tumor size, and BI-RADS classification yielded an AUC of 0.873, while a radiomics model derived from 14 selected features achieved an AUC of 0.836. Importantly, the combined model that integrated the Radscore and clinical factors demonstrated markedly superior predictive performance, with an AUC of 0.926 and bootstrap validation confirming an AUC range of 0.901 to 0.954. Moreover, the model based solely on BI-RADS classification produced an AUC of only 0.737, underscoring the added value of incorporating clinical factors (age and tumor size) and radiomics features for improved breast cancer prediction. To facilitate clinical integration, we constructed a nomogram based on the combined model, incorporating key predictors such as patient age, lesion size, BI-RADS category, and the Radscore. This tool enables clinicians to input these variables and obtain an individualized probability of malignancy, thereby potentially supporting more informed decision-making in routine clinical practice.
Clinical feature analysis, employing both univariate and multivariate approaches, consistently identified BI-RADS classification, patient age, and tumor size as significant predictors for differentiating benign from malignant breast lesions. The BI-RADS system, 22 a cornerstone in clinical ultrasound evaluation, classifies lesions based on attributes such as shape, margin definition, growth orientation, solidity, calcification, vascularity on color Doppler imaging, and peripheral echo characteristics. Despite its widespread clinical application, BI-RADS possesses inherent limitations—namely, its subjective interpretation and omission of quantitative parameters like age and lesion size—which may impair diagnostic precision. Increasing age is a well-recognized risk factor for malignancy, likely attributable to cumulative genetic alterations, hormonal changes, and a diminished regenerative capacity of breast tissue.23,24 Likewise, larger tumor size often correlates with more aggressive biological behavior and advanced disease stage, thereby elevating the risk of malignancy. 25 By integrating these clinical factors with BI-RADS, our diagnostic model offers a more robust and comprehensive assessment, mitigating inter-observer variability and enhancing predictive accuracy. These results are in concordance with previous studies,26,27 underscoring the value of a multiparametric approach in improving the reliability and accuracy of breast cancer diagnosis.
In this study, we selected 14 radiomics features with correlation coefficients below 0.9 to minimize redundancy and multicollinearity. All chosen features demonstrated statistically significant differences between benign and malignant lesions, suggesting their potential diagnostic value. Furthermore, hierarchical clustering analysis indicated that these features can differentiate between benign and malignant groups, highlighting their discriminative power. The strong clustering patterns observed reinforce the clinical relevance of radiomics features in breast lesion characterization. These findings suggest that radiomics feature selection based on correlation filtering and statistical significance can enhance the robustness and interpretability of radiomics models, demonstrating the feasibility of radiomics in improving ultrasound-based breast cancer diagnosis.
The combined model incorporating BI-RADS with tumor size, patient age, and radiomics features markedly enhanced the diagnostic accuracy for differentiating benign from malignant breast lesions compared to the use of BI-RADS alone. This improvement can be attributed to the complementary nature of these parameters: while BI-RADS provides a standardized qualitative assessment based on imaging characteristics, its inherent subjectivity and exclusion of quantitative factors may limit diagnostic precision. In addition, tumor size and patient age have been identified as independent risk factors for breast malignancy. Furthermore, radiomics features quantitatively capture intratumoral heterogeneity and subtle textural patterns that are often imperceptible to the human eye.10,11 By integrating these diverse data sources, the combined model mitigates inter-observer variability and offers a more comprehensive risk stratification. This multiparametric approach not only reinforces the clinical utility of BI-RADS but also paves the way for more individualized patient management, potentially reducing unnecessary biopsies.
We identified a set of radiomics features with significant discriminatory power between benign and malignant breast tumors. These features included shape descriptors (Elongation, Maximum2DDiameterRow), first-order statistical features (Interquartile Range, Kurtosis), and texture features derived from the gray-level co-occurrence matrix (GLCM) and gray-level size zone matrix (GLSZM). Shape features reflect tumor morphology, with malignant lesions typically exhibiting greater elongation and irregularity. 28 First-order features describe the overall intensity distribution, where higher kurtosis in malignant tumors suggests a more peaked intensity distribution, possibly indicating greater heterogeneity. 29 Texture features, particularly GLSZM-based metrics, capture spatial gray-level variations, with malignant lesions often displaying higher heterogeneity and lower uniformity due to their invasive growth patterns.30-32 These findings align with previous studies,12,14-17 which have also demonstrated the diagnostic value of shape, first-order, and texture features in breast cancer differentiation. The consistency of our results with prior research further supports the robustness and clinical relevance of radiomics in breast tumor characterization.
Compared with recent studies on radiomics and deep learning for breast cancer prediction, our study demonstrates several distinctive features. Whereas the majority of these investigations utilized static images from various modalities including conventional B-mode ultrasound, strain elastography, 27 and Automated Breast Volume Scanner (ABVS) 26 to extract radiomics features, our study is novel in extracting features from ultrasound videos. This dynamic approach has been rarely reported and may capture more characteristics of tumor behavior that static images cannot.
A recent study 33 developed a nomogram that combined B-mode ultrasound, strain ratio, and radiomics signature to differentiate benign from malignant BI-RADS 4 breast lesions, achieving an AUC of 0.93, demonstrating improved diagnostic accuracy. Another study 34 focused on predicting breast cancer using static ultrasound images, achieving AUCs of 0.956 in the training cohort and 0.937 in the validation cohort. Wang et al 35 developed an ABVS-based radiomics nomogram to differentiate benign from malignant BI-RADS 4 lesions. The model demonstrated high diagnostic performance, with an AUC of 0.925 in the test cohort. The diagnostic performance of these models was comparable to ours. However, these studies were limited to lesions classified as BI-RADS category 4 and 5, or solely category 4, which are generally associated with a moderate to high risk of malignancy. In contrast, our study encompassed a broader spectrum of breast lesions, including BI-RADS categories 3, 4, and 5. The inclusion of BI-RADS 3 lesions broadens the clinical applicability and enhances the generalizability of our model. This more comprehensive approach allows for a thorough evaluation of diagnostic performance across varying risk levels, including early-stage and low-risk lesions, which are commonly encountered in routine clinical practice.
Zhong and colleagues 36 developed a model that integrated intra- and peritumoral ultrasound radiomics features with clinical factors to differentiate benign from malignant breast lesions. Their combined model demonstrated slightly higher diagnostic performance, with an AUC of 0.960 in the test cohort. However, their regions of interest were delineated on 2D ultrasound images only, potentially limiting the comprehensive capture of tumor heterogeneity. Additionally, their dataset had an imbalance in the number of benign (n = 255) and malignant (n = 124) cases, which may have influenced the model's performance. In contrast, our study ensured a balanced cohort by randomly selecting an equal number of benign and malignant cases (1:1 ratio), enhancing both the generalizability and clinical applicability of our model. Furthermore, a recent study 37 assessed machine learning-based radiomics models using multiparametric breast magnetic resonance imaging (MRI) for lesion classification, achieving an AUC of 0.96, slightly higher than ours. However, their sample size was smaller (104 lesions vs 346 cases in our study). In contrast to MRI, our approach offers advantages such as the accessibility and cost-effectiveness of conventional ultrasound, along with the innovative use of ultrasound video data, which may better capture tumor characteristics. 38 Our findings provide a novel perspective on breast cancer prediction, highlighting the clinical potential of ultrasound video-based radiomics.
Despite the promising performance of our combined model, several limitations warrant consideration. Firstly, the retrospective design of this single-center study may introduce selection bias, potentially affecting the generalizability of our findings. Secondly, while the radiomics features demonstrated strong discriminatory power, their extraction relied on ultrasound videos acquired under standardized conditions. Variations in imaging equipment or operator-dependent acquisition protocols at other institutions could affect feature reproducibility and model performance. Thirdly, we did not assess inter-observer variability in BI-RADS classification, which could influence the consistency and reliability of this parameter in clinical practice. Lastly, the relatively small sample size of 346 female patients may limit the robustness of our conclusions, underscoring the need for larger cohorts to enhance the external validity of the study. Future studies should address these limitations by incorporating multimode data, validating the model in external settings, and exploring its impact on clinical decision-making through prospective trials.
Conclusion
We developed a combined model integrating clinical factors and radiomics features from ultrasound videos, which outperformed BI-RADS alone in differentiating benign from malignant breast lesions. Further validation in large, multicenter studies is needed to confirm its generalizability and clinical value.
Footnotes
Ethics Approval
The research involving human participants underwent comprehensive examination and obtained the official approval of the Institutional Review Board at Dongyang People's Hospital (Approval No. 2025-YX-170). The informed consent was waived because of the retrospective nature of this study.
Ethical Statement
This study was approved by the Ethics Review Committee of Dongyang People's Hospital (Approval No. 2025-YX-170) and conducted in accordance with the Declaration of Helsinki. As a retrospective study, the requirement for informed consent was waived by the Ethics Review Committee of Dongyang People's Hospital.
Author Contributions
LG and YJ examined the experiment and wrote this article. JW provided help with the data analysis. JW and XW revised this article. XW provided the research platform.
Funding
This study was founded by Jinhua Science and Technology Bureau Scientific Research Project (2022-3-019).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data analyzed in the current study are available from the corresponding author upon reasonable request.
