Abstract
Introduction
Breast cancer (BC) is a common disease that harms women's health and has gradually become the main cause of death of female patients with cancer. 1 Axillary lymph node (ALN) metastatic load is a valuable diagnostic and prognostic factor for the overall survival of patients with BC. 2 Sentinel lymph node biopsy is an effective tool for identifying ALN status with few complications, such as edema and nerve damage after axillary clearance. 3 According to the results of the latest ACOSOG Z0011 trial and AMAROS Phase III clinical study, the American Society of Clinical Oncology suggests that patients with BC in the T1-2 stage with less than 3 sentinel lymph node metastases and receiving whole breast radiotherapy after breast-conserving surgery no longer need to receive ALN dissection and can directly receive radiotherapy. 4 In addition, Ahmed et al 5 confirmed that nearly half of the patients with BC met the standard of low ALN load and could avoid ALN dissection. In conclusion, load status should be a new indicator to evaluate axillary status in the future rather than lymph node metastasis. In this study, we defined the ALN load criteria as low load (< 3 lymph node metastases) and high load (≥ 3 lymph node metastases) based on histopathological diagnosis, which is also considered the gold standard in clinical practice. 6
Ultrasonography is the main noninvasive imaging method for the preoperative evaluation of ALN status. However, its ability to detect axillary load is limited, and its sensitivity and specificity fluctuate greatly and cannot achieve a stable evaluation effect.7,8 A more efficient, noninvasive approach is urgently needed. In recent years, radiomics has developed rapidly in the field of ultrasonic diagnosis and prognosis prediction. The accuracy of clinical evaluation and prediction can be remarkably improved by extracting a large number of imaging features from ultrasound with high throughput, abstracting them into high-dimensional characteristic data, and combining them with other clinicopathological features. 9 Previous studies have shown that ultrasound radiomics technology is effective in predicting ALN metastasis.10,11 Lee et al 12 developed a radiomics model composed of 23 radiomics features and a preoperative clinicopathological model composed of 4 clinical factors (tumor size, location, subtype, and multiplicity) to predict ALN metastasis in patients with BC. Results showed that the combination of the radiomics model can remarkably improve the predictive performance of the clinicopathological model, which proves that the radiomics model has additional value in predicting ALN metastasis. Jiang et al 13 designed a radiomics nomogram based on ultrasound elastography to predict ALN status in early BC. A nomogram was established by combining shear-wave elastography signature, ultrasound-reported lymph node status, molecular subtype, and radiomics scores. The training set (overall C-index: 0.842; 95% confidence interval [CI]: 0.773-0.879) and validation set (overall C-index: 0.822; 95% CI: 0.765-0.838) showed good differentiation and helped radiologists accurately assess ALN status in BC. However, studies on how to effectively distinguish high- and low-load ALN metastases in BC by ultrasound deep learning radiomics (DLR) technology are few. Our aim was to validate whether the DLR nomogram (DLRN) based on a preoperative ultrasound could effectively predict ALN metastatic load in patients with BC.
Materials and Methods
Patients
The overall workflow of this study is shown in Figure 1. A retrospective analysis was conducted on pathologically confirmed BC cases in our hospital from February 2018 to April 2020. A total of 176 patients were enrolled in this study. Among which, 123 and 53 patients were randomly assigned to the training and test sets, respectively, with a ratio of 7:3. The inclusion criteria were ① patients with primary BC, ② single lesion < 5 cm in diameter, and ③ definite results of pathological ALN metastasis. The exclusion criteria were ① ultrasound image artifacts that were serious or failed to completely show the lesion boundary, ② patients who received neoadjuvant chemoradiotherapy or other biopsies, ③ the presence of multifocal or bilateral lesions, and ④ incomplete clinical and pathological data. Besides, all of the patient details in this study have been anonymized. The reporting of this study conforms to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guideline. 14

The overall workflow of the study.
Patient clinical and pathological data were obtained from the hospital's system. The pathological data included tumor type, ER, PR, HER2, and Ki67 status, and the results of sentinel and ALN dissection were also recorded as the total number of excised and positive lymph nodes. Clinical data included patient age, maximum tumor diameter, ultrasound BI-RADS grade, and capsule location. In this study, ER/PR ≥ 1% and HER2 ≥ +++ were defined as positive (+); otherwise, they were negative (
Ultrasound Image Acquisition
Breast ultrasound was performed by senior chief physicians with many years of experience in superficial organ ultrasound diagnosis. Philips EPIQ5, IU22 (Philips, Netherlands), and Esaote (Esaote, Italy) ultrasonic diagnostic instruments were used. High-frequency linear array probes with a probe frequency of 7 to 12 MHz were used, and the breast examination mode was selected. A multisection scan was performed on each quadrant of the bilateral breast, and the focal scan was performed on the lesion area. Maximum long-axis and short-axis ultrasonography of the primary breast lesion were performed according to lesion size, and the depth was adjusted according to lesion size and stored in DICOM format.
Regions of Interest Segmentation
3D Slicer software (V4.11) was used to manually sketch the regions of interest (ROIs) for the patient's breast lesions as shown in Figure 2 when the clinical results were not clear (ultrasound physician 1, with more than 5 years of segmentation experience). The diagnostically delineated ROI on the ultrasound images of 20 randomly selected patients were evaluated 1 week later to assess intraobserver reproducibility. In addition, the ROIs of the same 20 patients were again outlined by another experienced physician (ultrasound physician 2, with more than 10 years of segmentation experience) to assess interobserver reproducibility. The intragroup/intergroup correlation coefficient (ICC) was used to compare the intraobserver/interobserver consistency between the 2 ultrasound physicians, and ICC > 0.80 features were included in the next screening.

ROI sketch effect drawing: (a, b) female, 58 years old, ALN metastatic load 9/17; (c, d) female, 55 years old, ALN metastatic load 0/20. (a and c) are original images, whereas (b and d) are segmented images.
Feature Extraction
Radiomics features were extracted from the ultrasound results using Pyradiomics (V3.0.1). 17 The package includes a first-order statistic feature, shape feature, gray co-occurrence matrix feature, gray run matrix feature, gray size area matrix feature, gray dependence matrix feature, neighborhood gray difference matrix feature, and wavelet features.
The ResNet50 18 model was adopted as the basic framework for extracting deep learning features, and pretraining was conducted on the ImageNet dataset. After the pretraining, the weight of the network was processed by ImageNet, the last fully connected layer of the network was removed, and the global maximum pooling layer was used to obtain the maximum value of the feature map at each layer to convert the feature map to the original values. Guided Grad-CAM was used to visualize the output of the last convolution layer in ResNet50 to highlight which features played a more important role. The ResNet50 network structure is shown in Figure 3.

Improved ResNet50 structure diagram.
Establishment of DLR Signature (DLRS)
All feature data were preprocessed with Z-score normalization so that the features were in the same order of magnitude
An independent sample T-test was used to screen statistically significant features, which helped us to efficiently achieve a batch reduction of irrelevant features and a high-degree retention of relevant features. The most valuable top 20% features were selected based on a random forest recursive elimination algorithm. Finally, the least absolute shrinkage and selection operator (Lasso) was applied for final dimension reduction, and the lambda value with minimum cross-validation error was selected. DLRS containing radiomics features, deep learning features, and their corresponding weight coefficients can be constructed and the corresponding DLR-score can be calculated according to formula (2):
DLR Nomogram (DLRN) Model Building and Validation
Univariate logistic regression analysis was performed for clinicopathological features. Predictors with P < .05 were included in the multivariate logistic regression analysis together with DLRS, and DLRN models were constructed and visualized based on the results. Additionally, a clinical model containing only clinicopathological features was constructed for comparative analysis according to the multivariate logistic regression of clinicopathological features.
Six evaluation indexes, namely, the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, accuracy, specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV), were used to verify the model performance. The 95% CI of the AUC was obtained through 1000 resampling, and differences in the AUCs between models were determined based on the Delong test. Calibration curves and Hosmer–Lemeshow test were used to verify the good calibration of the DLRN model. Finally, decision curve analysis was used to evaluate the clinical practicability of each model.
Statistical Method
Statistical analysis was conducted on the software packages of Pandas (V1.3.5), Numpy (V1.21.5), and SciPy (V1.7.1) in Python 3.6 environment, and the results were verified by SPSS 22.0 software. Quantitative data were analyzed by independent sample T-test and expressed as x ± s. Qualitative data were analyzed by chi-square test and expressed as a percentage. P < .05 was considered statistically significant.
Results
Baseline Characteristics
The proportions of patients with a high loading rate of ALN metastasis were 34.10% (60/176) among all included patients, 34.15% (42/123) in the training set, and 33.96% (18/53) in the test set. No significant differences in clinical and pathological features were found between the training and test sets (all P > .05), while patient age and maximum tumor diameter were significantly correlated with ALN high/low load (P = .030 and P < .001), as shown in Table 1. Table 2 shows that there were statistical differences in age and maximum tumor diameter between the ALN high/low load groups in the training set (P = .012 and P < .001), while only the maximum tumor diameter was statistically different between the ALN high/low load groups in the test set (P = .014).
Patient and Tumor Characteristics.
Abbreviations: BI-RADS, breast imaging-reporting, and data system; ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor receptor 2; Ki67, tumor proliferating cell nuclear antigen 67.
Comparison of Clinicopathological Characteristics.
Abbreviations: BI-RADS, breast imaging-reporting, and data system; ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor receptor 2; Ki67, tumor proliferating cell nuclear antigen 67.
Feature Filtering and DLRS Construction
A total of 2517 features, including 469 radiomics features and 2048 deep learning features, were extracted from the ultrasound images of each patient, and 2038 features with good reproducibility remained after the preliminary screening with ICC > 0.80. After the independent sample T-test, 116 features with statistically significant high/low ALN load were screened out, and random forest recursion was used to eliminate the top 20% features retained by successive iterations. Finally, Lasso dimensionality reduction left 17 nonzero features, including 3 radiomics features and 14 deep learning features, for the construction of the DLRS. In equation (2), DLR-Score = 0.341 + 0.033×DL_134−0.081×DL_226−0.063×DL_286 + 0.040× DL_546−0.052×DL_731 + 0.043×DL_799 + 0.050×DL_890−0.067×DL_894−0.006×DL_1200−0.066×DL_1346 + 0.055×DL_1479−0.024×DL_1671 + 0.061×DL_1864−0.028×DL_2001 + 0.017×original_shape_MeshVolume + 0.036×original_firstorder_Energy + 0.001×wavelet_LH_glszm_SizeZoneNonUniformity. DLRS showed a good predictive performance on ALN load.
Development and Validation of the DLRN Model
The result of univariate analysis in the clinical model section of Table 3 shows that age and maximum tumor size were remarkable predictors of ALN metastatic load status. The clinical model was composed of patient age and maximum tumor diameter as shown in the multivariate section of the clinical model in Table 3. The DLRN model consisted of tumor maximum diameter and DLRS (Table 3 DLRN model). The DLRN model has high predictive performance, with AUCs of 0.900 (95% CI: 0.853-0.931) in the training set and 0.821 (95% CI: 0.769-0.868) in the test set. The predictive performance was also significantly improved compared to the clinical model, with P = .001 on the training set and P = .041 on the test set (DeLong test). The predictive effects of the clinical, DLRS and DLRN models are shown in Table 4, and the ROC curves of the training and test sets are shown in Figure 4. The visualization of the DLRN model is shown in Figure 5. The calibration curve and Hosmer–Lemeshow test (both P > .05) showed that the predicted value of the DLRN model was well calibrated (Figure 6). Decision curve analysis showed that the DLRN model had the highest clinical practicability (Figure 7). The last convolution layer of the network was visualized using Grad-CAM to investigate the interpretability of the deep learning features (Figure 8). We found that the internal features of the tumor were more valuable in the ultrasound images of patients with low ALN metastasis load, whereas the tumor boundary regions were more valuable in patients with high ALN metastasis load, which proves the effectiveness of the deep learning model to a certain extent.

Receiver operating characteristic (ROC) curves. Left: training set; right: test set.

Nomogram was constructed according to the maximum diameter and DLRS of patients in the training set to predict ALN metastasis load more intuitively.

Calibration curve of the deep learning radiomics nomogram (DLRN) model. Left: training set; right: test set.

Decision curve analysis.

Characteristic heat map of ResNet50-based guided Grad-CAM. (a and c) are ultrasonic images; (b and d) are the corresponding heat maps. The red areas indicate higher weights, and the blue forecasts indicate lower weights. Image (a) shows that the tumor boundary is valuable for predicting axillary lymph node (ALN) metastatic load status, and image (d) shows the tumor interior.
Single/Multivariate Logistic Regression Analysis of Clinicopathological Features.
Abbreviations: BI-RADS, breast imaging-reporting, and data system; ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor receptor 2; Ki67, tumor proliferating cell nuclear antigen 67; DLR, deep learning radiomics; DLRN, deep learning radiomics nomogram.
Performance Comparison of Different Models.
Abbreviations: DLRS, deep learning radiomics signature; DLRN, deep learning radiomics nomogram; AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value.
Discussion
In this study, we used DLR technology to extract and screen 17 features from ultrasonic images, including 3 radiomics and 14 deep learning features, to construct DLRS. The included clinicopathological features were age, maximum tumor diameter, pathological type, ultrasound BI-RADS, capsule location, ER, PR, HER2, and Ki67. Based on single/multivariate logistic regression analysis, age, and maximum tumor diameter were included to compose the clinical model, and maximum tumor diameter and DLRS were combined to establish the DLRN model. The AUCs of the clinical, DLRS, and DLRN models were 0.795, 0.877, and 0.900 for the training set and 0.722, 0.805, and 0.821 for the test set, respectively. The DLRN model had the best predictive performance, and its visual graph could more quickly evaluate the probability of patients with high/low ALN metastatic load. It is convenient for assisting in clinical diagnosis.
Table 3 shows that patient age and maximum tumor diameter were independent predictors of ALN metastasis load. In the fusion analysis with DLRS (Table 3 DLRN model), only maximum tumor diameter could improve the predictive performance of DLRS, and the performance of the DLRN model constructed with the addition of maximum tumor diameter was superior to that of DLRS. Some scholars 19 believed that age and maximum tumor diameter are risk factors related to ALN metastasis load but did not conduct multivariate analysis with radiomics signatures. We found that the performance of the DLRN model was significantly improved compared with the clinical model (DeLong test: all P < .05) but did not significantly improve compared with the DLRS (DeLong test: P = .219, .671). This outcome may be because the influence of DLRS on the DLRN model is much greater than the influence of maximum tumor diameter, and the odds ratio between the 2 factors is also substantially different.
Recent studies have used radiomics techniques to predict ALN metastasis, and the AUCs of the test set are between 0.715 and 0.860.13,20–22 However, most studies are only based on traditional radiomics features and do not study the deep learning features of the image, which causes some limitations on the performance of predicting the ALN status. Some studies23,24 also included clinical stage, receptor status, lymphatic vascular infiltration, and other clinicopathological features in the prediction model. However, some specific clinicopathological features, such as lymphatic vascular infiltration, can only be obtained after surgery and has difficulty in effectively guiding clinical decision-making before surgery. In addition, almost all previous studies defined ALN status as the presence or absence of metastasis. However, with the update of the ACOSOG Z0011 trial and AMAROS Phase III clinical study results, ALN metastasis load will gradually become a new indicator for the future preoperative evaluation of axillary status. Different from previous studies, this study was based on deep transfer learning and traditional radiomics techniques, and combined radiomics and deep learning features of preoperative ultrasound to describe the characteristics of breast tumors in a more comprehensive way. Almost all available clinical and pathological features before surgery were included. Therefore, the DLRN model with clinical and DLR features is proposed as a feasible scheme for the noninvasive preoperative prediction of ALN load. Some scholars have applied radiomics in the computed tomography and magnetic resonance imaging of rectal and bladder cancers to predict the status of local lymph node metastasis, which also proved that radiomics is an effective prediction method.25,26 For patients with BC, ultrasound is a conventional imaging method to evaluate lesions and ALN status, as it has low cost and no radiation. 27 In comparison, our study also focused on the combination of preoperative clinicopathological features and DLR methods to supplement more clinical feature information besides ultrasound images. This method can improve the performance and achieve the constraining effect of DLR features to make the model more robust.
The study also had some limitations. First of all, this study is a single-center retrospective study. Prospective and multicenter studies are still needed for further verification. Second, only patients with a single BC were included in this study. Therefore, the proposed DLRN model constructed with maximum tumor diameter and DLRS can only predict the ALN metastatic load of patients with a single BC. Other prediction models should be further established for patients with bilateral or multifocal BC. Next, the ROI in the study was manually delineated, which may lead to the interobserver difficulty of some features, which will be solved by an automatic segmentation network in the following studies. Finally, the sample size included in this study was small. We are still actively collecting more samples to further verify our model.
Conclusion
In summary, we built a DLRN model based on preoperative ultrasound DLRS and the maximum tumor diameter of patients with BC. The DLRN model can be used as an effective method to evaluate the ALN metastatic load status of patients with BC before surgery and to assist in clinical decision-making.
Footnotes
Acknowledgments
The authors wish to thank Dr Qianyi Xi for their technical support in editing the manuscript.
Abbreviations
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
This retrospective study was approved by the ethics committee of the Affiliated Nanjing Medical University Changzhou Second People's Hospital (approval number: [2020] KY154-01). Due to the retrospective nature of this study, the ethics committee of the hospital waived the informed consent of the patients and confirmed compliance with the Declaration of Helsinki and the confidentiality of the patient data.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported in part by the Social Development Project of Jiangsu Provincial Key Research & Development Plan (Project No. BE2022720) and General Project of Jiangsu Provincial Health Commission (Project No. M2020006).
Correction (April 2023):
A sentence has been corrected in section Feature Filtering and DLRS Construction.
