Abstract
Introduction
Patients with Magnetic Resonance Imaging (MRI) axillary lymph node (ALN) negative in breast cancer may still have occult sentinel lymph node (SLN) metastases, which can influence clinical treatment strategies. This study aimed to develop an MRI-based radiomics model for predicting occult SLN metastases and D2-40 expression, and to investigate the intrinsic associations between D2-40 expression, SLN status, and related radiomics features.
Methods
This retrospective study included 141 MRI-diagnosed ALN-negative breast cancer patients from Center 1, randomly divided into training (n = 98) and validation (n = 43) sets (7:3 ratio). An independent external validation cohort (n = 40) from Centers 1 + 2 was established for model validation. Three logistic regression models were developed: (1) a clinical model, (2) a radiomics model, and (3) a clinic-radiomics nomogram, which predict SLN metastasis (Model 1) and D2-40 expression (Model 2). In addition, the correlation between D2-40 expression and SLN status was analyzed in this study using chi-square test. And the feature correlation between SLN radiomics model and D2-40 radiomics model and the strength of association between D2-40 and Model 1 features were assessed by Spearman and Pearson correlation analysis, respectively.
Results
The nomogram outperformed the other models in both Model 1 (AUC: 0.821 validation/0.746 external) and Model 2 (AUC: 0.810 validation/0.645 external). D2-40 correlated with SLN status (P < .001). There were feature correlations between Model 1 and Model 2 features (Spearman) and between D2-40 and Model 1 features (Pearson).
Conclusions
MRI-based radiomics features were effective in predicting occult SLN metastasis and D2-40 expression status in MRI ALN-negative breast cancers. An association was identified between D2-40 expression and SLN status, along with the corresponding radiomics features.
Introduction
Breast cancer (BC) has become the second most prevalent cancer worldwide and the most common malignant tumor among women. 1 Among these factors, the metastatic status of lymph nodes and the expression characteristics of key molecules serve as essential indicators for evaluating patient prognosis and provide a critical foundation for clinical treatment decisions. 2
As the most common metastatic pathway for breast cancer, axillary lymph node (ALN) metastasis plays a critical role in disease staging, treatment selection, and prognosis assessment. 3 Currently, imaging techniques are frequently employed to evaluate ALN status, particularly magnetic resonance imaging (MRI), which offers a comprehensive view of the axilla and demonstrates greater accuracy in assessing ALN involvement. 4 However, in patients with early-stage breast cancer, changes in lymph nodes may not be clearly visible on MRI images. The assessment of ALN status using MRI carries a certain false negative rate (FNR), which complicates the diagnosis. The sentinel lymph node (SLN) is the first drainage station for lymph node metastasis in breast cancer.5,6 In breast cancer patients with ALN-negative MRI results, there may be occult SLN metastases, which can be confirmed through sentinel lymph node biopsy (SLNB). However, SLNB is an invasive procedure that can lead to complications such as chronic pain and lymphedema, negatively impacting patients’ quality of life. 7 Therefore, it is essential to identify a noninvasive method that can accurately detect lymph node metastasis.
In the field of breast cancer molecular biology, D2-40, a lymphatic endothelial-specific monoclonal antibody, accurately labels lymphatic endothelial cells in both normal and tumor tissues. It is currently recognized as a highly specific marker for lymphatics.8,9 D2-40-labeled lymphovascular infiltration (LVI) is closely associated with ALN involvement in breast cancer. LVI can manifest earlier than lymph node metastasis and serves as a crucial indicator for assessing breast cancer progression, determining prognosis, and objectively evaluating treatment efficacy. 9 At this stage, the assessment of molecular biomarkers for breast cancer is primarily conducted through immunohistochemistry. Although this method is highly accurate, the experimental process involves multiple steps, requires a significant amount of time, and necessitates high-quality tissue samples. Furthermore, the interpretation of results can be subjective. More importantly, it has the disadvantage of being invasive and lagging. Therefore, there remains a need to identify a rapid, objective, and noninvasive method for accurately assessing the expression of D2-40 molecular biomarkers.
Radiomics can transform medical images that contain tumor pathophysiology and other pertinent information into mineable, high-dimensional feature data (radiomics features). This data can subsequently be analyzed through quantitative image analysis to yield objective and precise clinical predictions. 10 Multiple imaging modalities are available for breast evaluation. CT and mammography, while operationally efficient and highly sensitive to microcalcifications, are limited by ionizing radiation exposure, suboptimal spatial resolution, and inadequate axillary nodal coverage. 11 While ultrasound examination offers distinct advantages as a radiation-free modality that enables real-time dynamic assessment, its clinical utility remains limited by several inherent limitations: comparatively inferior spatial resolution than other imaging techniques, incomplete visualization of the entire breast parenchyma, and significant operator dependance, which compromises scanning standardization and reproducibility. In contrast, magnetic resonance imaging (MRI) has become one of the most important diagnostic tools for breast cancer due to its advantages of being radiation-free, offering excellent soft tissue resolution, and providing multi-parameter imaging capabilities. These features enable a more comprehensive assessment of the tumor's biological characteristics. Numerous studies have been conducted to predict lymph node metastasis and the molecular expression status of breast cancer through MRI radiomics, all of which have demonstrated robust predictive capabilities.6,8,12,13 However, there have been no studies examining the challenges associated with diagnosing occult SLN metastases that are not clearly visible on breast MRI. Meanwhile, no studies have been identified that utilize MRI radiomics to evaluate the positive expression of D2-40 in breast cancer.
The aim of this study was to develop a preoperative method utilizing MRI radiomics to predict occult SLN metastasis and positive expression of D2-40 in breast cancer. This approach seeks to assist in clinical treatment decision-making, reduce the need for invasive tests, and enhance the quality of life for patients. Meanwhile, we investigated the correlation between D2-40 expression and SLN status, as well as the associated radiomics features. This study aims to validate the SLN metastasis pathway while also aiding in the modeling of decision-making processes.
Materials and Methods
General Information
This retrospective study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of [masked for peer review] (Approval No. 20250120011), with a waiver of informed consent due to the retrospective nature of the research. This study adhered to the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) statement. 14
This study retrospectively collected 223 patients with invasive breast cancer diagnosed by postoperative pathology at two medical centers between 2019 and 2024.
Inclusion criteria included: (1) postoperative pathologically confirmed diagnosis of invasive breast cancer, including invasive ductal carcinoma and invasive lobular carcinoma, (2) Standardized breast DCE-MRI was performed within 2 weeks preoperatively, (3) MRI demonstrated a mass-type breast cancer, (4) MRI images indicating ALN negativity, (5) undergoing SLNB with definitive pathology, and (6) Complete clinical data were available.
The exclusion criteria included: (1) non-massive breast cancer, (2) noninvasive breast cancer, (3) preoperative neoadjuvant therapies such as radiotherapy or chemotherapy, (4) incomplete clinicopathological data, and (5) incomplete or substandard quality of MRI images.
This study ultimately enrolled 181 eligible breast cancer patients who were rigorously allocated for model development and validation. The cohort included 141 patients from Center 1 (2019-2023), randomly divided in a 7:3 ratio into a training set (n = 98) and an internal validation set (n = 43). An external validation cohort (n = 40) was established from two independent centers, comprising 13 consecutively enrolled patients from Center 1 in 2024 and 27 consecutive cases from Center 2 (2020-2023). All external validation cases strictly adhered to the same inclusion and exclusion criteria as the training cohort, with standardized imaging acquisition protocols and pathological evaluation procedures uniformly implemented across all study sites. The patient selection flowchart is illustrated in Figure 1.

Patient Selection Flowchart. This Flowchart Shows the Inclusion and Exclusion Situations for the Study Participants. MRI, Magnetic Resonance Imaging; ALN, Axillary Lymph Node; SLNB, Sentinel Lymph Node Biopsy.
This study adopted complete-case analysis, with independent samples t-tests and chi-square tests confirming no significant differences in baseline characteristics between excluded and included cases (all P > .05).
Clinical data, including age, gender, CA 153 levels, platelet count, lymphocyte count, and the neutrophil-to-lymphocyte ratio, were collected from the electronic medical record system for all patients.
MRI Image Acquisition
All MRI images in this study were acquired using two 3.0 T scanners: the SIGNATM Pioneer from GE Healthcare and the Ingenia Elition from Philips Healthcare. The patient was positioned in a standard prone position during the scanning procedure, utilizing a specialized 8-channel breast phased array coil. DCE-MRI data acquisition was conducted using the DISCO (Differential Subsampling with Cartesian Ordering) sequence, which was combined with a fat suppression technique. The specific scanning protocol was as follows: baseline T1-weighted imaging (T1WI) images of the lipid-suppressed axial position were first acquired, followed by an intravenous administration of 0.1 mmol/kg Gadolinium-Diethylenetriamine Pentaacetic Acid (Gd-DTPA) at a rate of 2 mL/s, with an immediate 20-mL saline washout administered at the same rate. Seven consecutive periods of dynamic enhancement images (with a temporal resolution of 19.4 s per period) were acquired without any intervals following the contrast injection. The images from the early intensification phase (the first four phases) were selected for further analysis.
Two diagnostic breast imaging physicians, each with over five years of experience, evaluated the status of the ALN by screening ALN-negative breast MRI images. They observed and documented various imaging features, including breast category, background parenchymal enhancement (BPE), burr sign, lobulation sign, tumor edge (smooth/non-smooth), longest diameter of the tumor, and ADC values. If the two radiologists disagree, the case is reviewed by a senior radiologist.
Positive ALN on MRI are characterized by enlarged or fused lymph nodes, short diameters greater than 1.0 cm, long-to-short diameter ratios less than 1.5, uneven cortical thickening, peripheral edema, absence of lymphatic gates, irregular morphology, inhomogeneous enhancement, or circumferential enhancement.9,15 The opposite is negative.
Pathological Analysis
Postoperative tissue specimens were fixed in formalin, embedded in paraffin, and stained with hematoxylin and eosin (H&E). Perform D2-40 immunohistochemical (IHC) staining. A preliminary examination of the slides was conducted by a pathologist with over 10 years of experience using a 40x optical microscope to identify areas of dense staining. This was followed by observation and analysis under a 200× high-power microscope, with pertinent results recorded.
For SLN status, the presence of one or more SLN metastases is classified as SLN positive; otherwise, it is considered negative. For D2-40 expression, positivity is defined as the presence of positive staining in the cell membrane and/or cytoplasm of tissue cells; conversely, it is considered negative.
In addition, we collected supplementary pathological data from the patients, which included estrogen receptor (ER), progesterone receptor (PR), Ki-67, E-cadherin (E-cad), human epidermal growth factor receptor 2 (HER-2), and histological grade. For Ki-67 expression, high expression is defined as ≥15% of tumor cell nuclei staining positive, whereas low expression is characterized by less than 15% positive staining. For HER2 expression, a negative result is defined as an IHC score of 0, low expression is defined as a score of 1+, and high expression is defined as a score of 3+. In cases where the IHC score is 2+, fluorescence in situ hybridization (FISH) is employed to assess gene amplification. If FISH results are positive, the expression is classified as high; if FISH results are negative, it is classified as low.
Tumor Segmentation and Feature Extraction
Upload all MRI images and clinical pathological data to the Radcloud platform (Huiying Medical Technology Co, Ltd, https://www.huiyihuiying.com) for data processing, image segmentation, and feature extraction. Two junior radiologists semi-automatically delineated the entire lesion area as a region of interest (ROI), layer by layer, at all continuous levels of the tumor. For patients with multiple masses, only the largest mass was included in the study. Then, a senior radiologist will review all the contours. If the difference is greater than or equal to 5%, the senior radiologist will determine the tumor boundaries. All radiologists are only aware of the diagnosis of invasive breast cancer, and they are not informed about other pathological results, including lymph node and immunohistochemical results.
To enhance the reproducibility and repeatability of radiomics features, the software automatically conducts image resampling and gray-level discretization to standardize MRI images prior to feature extraction. Feature types are categorized into three groups: (1) features based on size and shape, which reflect the dimensions and shape of the region; (2) first-order statistical features that characterize the distribution of voxel intensities; and (3) texture features that measure the heterogeneity within the region. Texture features include the Gray Level Run Length Matrix (GLRLM), the Gray Level Co-occurrence Matrix (GLCM), and the Gray Level Dependence Matrix (GLDM).
Feature Selection
Feature selection is implemented using R Software (version 4.4.3, https://www.r-project.org/). The three-step feature selection method is employed to optimize the feature subset for model construction: (1) The variance threshold method eliminates features with very low variance. A threshold of 0.8 is established, excluding features with a variance below this value. (2) The SelectKBest method evaluates the correlation between each feature and the classification outcome, selecting features with a P-value of less than .05. (3) The Least Absolute Shrinkage and Selection Operator (LASSO) feature selection method is utilized to reduce the dimensionality of the feature space and mitigate redundancy. L1 regularization serves as the cost function, with a cross-validation error value set to 5 and a maximum number of iterations capped at 1000. Subsequently, the radiomics score (Rad-score) for each patient is calculated based on the weighted coefficients of the selected features.
Model Construction and Validation
We developed two sets of predictive models: Model 1 to forecast occult SLN metastasis in breast cancer patients, and Model 2 to assess the expression status of the D2-40 molecule. Each group of models developed the following three models: (1) Clinical model: This model gathers patients’ clinical, pathological, and imaging features, establishing itself through univariate and multivariate logistic regression analyses to identify significant clinical, pathological, and imaging parameters. (2) Radiomics model: This model is based on MRI radiomics and employs a logistic regression (LR) classifier, constructed using the selected key radiomics features. (3) Combined model: This model integrates the Rad-score with important clinical quantitative parameters to create a clinical-radiomics nomogram. To validate the generalizability of the predictive model, external validation was conducted to assess its predictive performance.
In order to enhance the understanding of model decision-making, this study investigates the relationship between the D2-40 molecule and the status of SLN, along with associated radiomics features. The correlation between D2-40 expression levels and SLN status (positive/negative) was assessed using the chi-square test. Spearman rank correlation analysis was conducted to evaluate the relationships among key radiomics features in models 1 and 2. Additionally, Pearson correlation analysis was employed to assess the linear correlation between D2-40 expression and key radiomics features in model 1. Finally, the heatmap was utilized to visualize the correlation patterns among the variables.
The research design diagram is presented in Figure 2.

Research Design Flowchart. This Flowchart Illustrates a Comprehensive Workflow for Medical Image Analysis, Encompassing four Main Stages: Image Segmentation & Feature Extraction, Feature Selection, Model Building & Evaluation, and Correlation Analysis. SLN, Sentinel Lymph Node; PR, Progesterone Receptor.
Statistical Analysis
Data processing and analysis were conducted using R version 4.4.3, along with Zstats 1.0 (www.zstats.net). The R packages utilized included “pROC,” “rmda,” and others. For continuous variables, the Kolmogorov-Smirnov test (K-S test) was initially applied to assess whether they followed a normal distribution. Subsequently, the Mann-Whitney U test was employed to compare differences in continuous variables across different groups. The chi-square test was used to evaluate differences between categorical variables. In evaluating model performance, the predictive efficacy of each model was analyzed using the Receiver Operating Characteristic (ROC) curve, and the area under the curve (AUC) was calculated as a quantitative indicator. To further assess the clinical applicability of the models, Decision Curve Analysis (DCA) was employed for validation. All statistical tests were conducted using two-sided tests, with a P-value of less than .05 considered statistically significant. No model updating was performed as the validation demonstrated satisfactory performance across all prespecified metrics.
Results
General Information
A total of 141 eligible patients with invasive breast cancer (140 females and 1 male; average age 56.79 years, age range 29 to 85 years) were included in the training and validation groups. Among these patients, 33 were classified as SLN-positive (23.4%), while 108 were classified as SLN-negative (76.6%). According to the immunohistochemical results, there were 29 D2-40-positive patients (20.57%) and 112 D2-40-negative patients (79.43%).
The external validation group comprised 40 eligible patients diagnosed with invasive breast cancer (all female; average age 49.6 years, age range 32 to 79 years). Among these patients, 8 were classified as SLN-positive (20%), while 32 were classified as SLN-negative (80%). Additionally, there were 8 patients who tested positive for D2-40 (20%) and 32 patients who tested negative for D2-40 (80%).
No statistically significant differences in clinicopathological characteristics were observed between the training cohort and either the internal or external validation cohorts. Detailed data are presented in Tables 1 and 2.
Comparison of Baseline Characteristics Between Training and Validation Cohorts.
Z, Mann-Whitney test; χ², Chi-square test; -, Fisher exact; M, Median; Q1, first Quartile; Q3, 3rd Quartile; PLT, Platelet; LYM, Lymphocyte; NLR, Neutrophil-to-Lymphocyte Ratio; BPE, Background Parenchymal Enhancement; ER, Estrogen Receptor; PR, Progesterone Receptor; HER-2, Human Epidermal Growth Factor Receptor 2; E-cad, E-Cadherin; SLN, Sentinel Lymph Node.
Comparison of Baseline Characteristics Between Training and External Validation Cohorts.
Z, Mann-Whitney test; χ², Chi-square test; -, Fisher exact; M, Median; Q1, first Quartile; Q3, 3rd Quartile; PLT, Platelet; LYM, Lymphocyte; NLR, Neutrophil-to-Lymphocyte Ratio; BPE, Background Parenchymal Enhancement; ER, Estrogen Receptor; PR, Progesterone Receptor; HER-2, Human Epidermal Growth Factor Receptor 2; E-cad, E-Cadherin; SLN, Sentinel Lymph Node.
Radiomics Analysis
After conducting feature selection using LASSO analysis, Model 1 and Model 2 identified 8 and 37 key radiomics features, respectively (Figure 3).

Feature Selection Diagram Based on Model 1 (A-C) and Model 2 (D-F). (A), (D) LASSO Path without Cross- Validation. The X-Axis Is Log Lambda and the Y - Axis Is Coefficients. Each Line Represents a Feature's Coefficient Path as Lambda Varies. (B), (E) Feature Importance. The y-Axis Lists Features and the X-Axis Is Coefficients. Bar Lengths Indicate Feature Importance in Logistic Regression. (C), (F) Correlation Coefficients among Selected Features. The Color Gradient, from Dark Blue (Strong Positive) to Dark Red (Strong Negative) with White for Near-Zero Correlation.
To control for potential confounding factors and enhance the clinical applicability of the model, the multivariable analysis included clinically significant prognostic factors (age, maximum diameter, and histological grade) by forced entry, along with variables showing P < .10 in the univariate analysis. The results of both univariate and multivariable logistic regression analyses for clinical variables in breast cancer patients are presented in Tables 3 and 4. Based on these selected radiomics features and clinical characteristics, we developed the clinical model, the radiomic model, and the nomogram.
Univariate and Multivariate Logistic Regression Analyses Based on Model 1.
Coef, Coefficient; Wald Z, Wald Z-statistic; *, P < 0.1; **, P < 0.05; PLT, Platelet; LYM, Lymphocyte; NLR, Neutrophil-to-Lymphocyte Ratio; BPE, Background Parenchymal Enhancement; ER, Estrogen Receptor; PR, Progesterone Receptor; HER-2, Human Epidermal Growth Factor Receptor 2; E-cad, E-Cadherin. Model 1: MRI radiomics model for predicting occult sentinel lymph node (SLN) metastasis.
Univariate and Multivariate Logistic Regression Analyses Based on Model 2.
Coef, Coefficient; Wald Z, Wald Z-statistic; *, P < .1; **, P < .05; SLN, Sentinel Lymph Node; PLT, Platelet; LYM, Lymphocyte; NLR, Neutrophil-to-Lymphocyte Ratio; BPE, Background Parenchymal Enhancement; ER, Estrogen Receptor; PR, Progesterone Receptor; HER-2, Human Epidermal Growth Factor Receptor 2; E-cad, E-Cadherin. Model 2: MRI radiomics model for predicting D2-40 positive expression.
For Model 1, the AUC values for the clinical model, radiomics model, and nomogram in the training group were 0.865, 0.660, and 0.888, respectively. In the validation group, the AUC values were 0.733, 0.618, and 0.821, respectively. The external validation results were 0.637, 0.635, and 0.703, respectively. Within a reasonable threshold probability range, DCA has demonstrated that the model has good clinical applicability. The ROC curve and DCA results are shown in Figure 4.

(A) Nomogram for Predicting Occult SLN Metastasis Based on Model 1. Total Points Correspond to Estimated Probability of SLN Metastasis. (B-D) Receiver Operating Characteristic (ROC) Curves of the Training Set, Validation Set, and External Validation Set; (E-G) Decision Curve Analysis (DCA) of the Training Set, Validation Set, and External Validation Set. SLN, Sentinel Lymph Node; PR, Progesterone Receptor. Model 1: MRI Radiomics Model for Predicting Occult SLN Metastasis.
For Model 2, the AUC values for the clinical model, radiomics model, and nomogram in the training group were 0.818, 0.944, and 0.921, respectively. In the validation group, the AUC values were 0.631, 0.601, and 0.810, respectively. The external validation results yielded AUC values of 0.543, 0.615, and 0.645, respectively. DCA indicated that the model possesses significant clinical application value. The results of the ROC curve and DCA are presented in Figure 5.

(A) Nomogram for Predicting D2-40 Positive Expression Based on Model 2. Total Points Correspond to Estimated Probability of D2-40 Positive Expression. (B-D) Receiver Operating Characteristic (ROC) Curves of the Training Set, Validation Set, and External Validation Set; (E-G) Decision Curve Analysis (DCA) of the Training Set, Validation Set, and External Validation Set. SLN, Sentinel Lymph Node. Model 2: MRI Radiomics Model for Predicting D2-40 Positive Expression.
Correlation Analysis
We analyzed 141 patients from Center 1. Among the SLN-positive patients, 16 (48.5%) were D2-40 positive, while 17 (51.5%) were D2-40 negative. Among the SLN-negative patients, 13 (12%) were D2-40 positive, and 95 (88%) were D2-40 negative. The chi-square test indicated a significant correlation between SLN status and D2-40 expression (P < .001) (Table 5).
Association Between SLN Status and D2.40 Expression with Effect Size Measures.
OR (95% CI), Odds Ratio (95% Confidence Interval); Pearson χ², Pearson's Chi-Squared Test; SLN, Sentinel Lymph Node.
Spearman rank correlation analysis (Figure 6A) revealed a significant correlation between the feature sets in Model 1 and Model 2. Notably, the Maximum.12 feature and the Large Dependence Low Gray Level Emphasis (LDLGLE) texture features in Model 1 exhibited a strong correlation with the Maximum series features and the LDLGLE texture features in Model 2. This indicates that these features are consistent and relevant in the computation and selection processes of both models.

(A) Visualization of Spearman Correlation Coefficients. The X-Axis Represents Radiomics Features from Model 2, While the Y-Axis Displays Radiomics Features from Model 1. Color Intensity and Dot Size Indicate Correlation Strength (Red = Positive, Blue = Negative, White ≈ 0). (B) Correlation of D2.40 with Model 1 Features. Abscissa: Correlation Coefficient (0.00-0.15); Ordinate: Feature Names. Color Gradient Reflects Correlation Strength (Red = Strong, Purple = Moderate, Blue = Weak). Model 1: MRI Radiomics Model for Predicting Occult SLN Metastasis. Model 2: MRI Radiomics Model for Predicting D2-40 Positive Expression.
Pearson correlation coefficient analysis (Figure 6B) demonstrated a positive correlation between the expression of D2-40 and the characteristics of LDLGLE.11 and Maximum.12 in Model 1. This suggests that these characteristics may reflect the biological traits associated with D2-40.
Discussion
Lymph node metastasis and the molecular expression levels of breast cancer play a crucial role in the diagnosis, treatment, and prognosis evaluation of the disease. The SLN is the primary drainage site and the first stop for potential metastasis from a primary tumor. 16 Preoperative assessment of SLN status in breast cancer plays a crucial role in guiding surgical techniques and clinical treatment decisions. However, two major clinical challenges persist. First, in cases of early lymph node metastases or micrometastases in breast cancer, conventional MRI often fails to reveal visually discernible typical imaging features, as these lesions may still exhibit imaging characteristics consistent with ALN-negative status. For patients with negative ALN on MRI, there may be occult SLN metastasis, which presents a significant challenge and has become a focal point of clinical research. Secondly, the current gold standard—SLNB—has significant limitations. As an invasive procedure, it not only prolongs operative time and increases the risk of complications but also often yields negative pathological results in the majority of patients, indicating that many individuals undergo unnecessary surgical risks. 17 In this context, radiomics demonstrates significant potential by enabling comprehensive extraction of high-throughput imaging features and quantitative assessment of intralesional heterogeneity. Through advanced analysis of primary tumor imaging phenotypes, this approach overcomes the inherent limitations of conventional morphological diagnosis, facilitates noninvasive clinical evaluation, mitigates risks of overtreatment, reduces healthcare expenditures, and offers significant advantages in the evaluation of lymph node metastasis. 18 Especially for patients with MRI-defined ALN-negative status but clinically high-risk profiles, radiomics models can establish refined metastatic risk stratification by integrating multiparametric imaging biomarkers. This methodology provides clinically actionable decision support and represents a critical step toward achieving precision medicine.
This study focused on the challenges of clinically diagnosing MRI-negative ALN breast cancer patients and developed an MRI radiomics model for the preoperative prediction of occult SLN metastasis. The model demonstrated promising results and highlighted the significant potential of radiomics in imaging diagnosis. Radiomics can capture subtle pathological changes that are often difficult for the human eye to detect, thereby aiding clinical diagnosis and treatment. The clinical-radiomics nomogram, which integrates clinicopathological features, enhanced diagnostic efficiency. The AUC in the training set and validation set were 0.888 and 0.821, respectively, outperforming both the clinical and radiomic models. These findings are consistent with other studies related to lymph nodes.19,20 Our external validation results indicated that the nomogram (AUC = 0.703) exhibited stable performance in predicting occult SLN metastasis in breast cancer patients.
Lymph node metastasis is the primary independent prognostic factor for breast cancer, and the lymphatic system serves as the principal pathway for tumor dissemination. D2-40, a monoclonal antibody, can be used to detect the transmembrane glycoprotein (mucin) present on lymphatic endothelial cells. This marker has been employed in numerous studies involving various malignant tumors. 21 Teel 22 first described the significance of LVI in breast cancer in 1964. Current studies21,23 have confirmed that D2-40 can be utilized to evaluate LVI in breast cancer. Wu et al 8 developed a model based on MRI radiomics to investigate the diagnostic value of D2-40 for LVI in patients with invasive breast cancer. Furthermore, Gudl et al 24 posited that D2-40-labeled LVI is an independent prognostic factor for patients with lymph node-negative breast cancer. This study developed an MRI radiomics model to assess the positive expression of D2-40 in patients with ALN-negative breast cancer. The results indicated that the AUC of the clinical-radiomics nomogram validation set, which incorporated clinical factors and Rad-score, was superior to that of both the clinical model and the radiomics model (AUC = 0.810 vs 0.631 vs 0.601). DCA and external validation further demonstrated that the nomogram exhibits strong performance and stability.
Previous studies 25 have demonstrated that D2-40, a marker for lymphatic vessels, is correlated with lymph node metastasis in breast cancer. This correlation may arise because the primary tumor can induce lymphangiogenesis and alter the local microenvironment, creating a microenvironment that facilitates metastasis. During this pathological process, tumor cells undergo epithelial-mesenchymal transition (EMT), detach from the primary tumor, invade the lymphatic endothelial barrier to enter the lymphatic system, and subsequently migrate to the SLN through lymphatic circulation, ultimately resulting in lymph node metastasis. 26 Our study also confirmed a significant correlation between D2-40 expression and SLN status (P < .001). This finding not only supports the potential of D2-40 as a predictor of SLN status but also further elucidates the mechanism of lymph node metastasis.
In recent years, the development of radiomics has resulted in a growing body of research dedicated to understanding the decision-making processes inherent in radiomics models. The features derived from radiomics, which are essential for the decision-making capabilities of these models, provide a direct representation of the microscopic characteristics of tumors and present notable benefits for the noninvasive evaluation of tumor heterogeneity. As a result, radiomics has become a prominent area of investigation for the identification of biomarkers that can predict cancer prognosis. 27 Therefore, this study investigated the multidimensional relationship between D2-40 expression, SLN status, and relevant radiomics features. Based on the previously established Model 1 and Model 2, we identified key predictive feature sets and assessed their correlations using Spearman's rank-order analysis. The results demonstrated a certain degree of association between the two feature sets, further supporting the role of D2-40-labeled lymphatic vessels as a pathway for SLN metastasis. It is important to note that the Maximum series of morphological features from the two models, which reflect geometric characteristics such as the maximum diameter of the tumor, and the LDLGLE series of texture features, which characterize the heterogeneity of gray distribution within the tumor, exhibited a strong positive correlation. We posit that the LDLGLE texture feature directly quantifies tumor heterogeneity, a critical hallmark of malignancy that reflects intratumoral genetic variability and phenotypic diversity across distinct tumor subregions. 28 Higher tumor heterogeneity indicates intratumoral density variations, potentially reflecting necrotic or fibrotic regions — pathological alterations known to be closely associated with lymphangiogenesis. 29 The Maximum morphological feature likely characterizes the biological behavior at the tumor invasive frontier. Larger tumor volume with irregular margins is generally associated with higher propensity for local tissue infiltration and lymphatic invasion. This robust macro-micro feature correlation not only validates the unique value of radiomics in elucidating tumor biological characteristics, but also provides a theoretical foundation for constructing multimodal feature-integrated predictive models of lymph node metastasis. Furthermore, while we identified a correlation between D2-40 expression and both Maximum morphological features and LDLGLE texture characteristics, the association remained relatively weak. We speculate that in this study, D2-40 positivity was significantly higher in SLN-positive patients (48.5%) compared to SLN-negative cases (12%). However, the limited absolute number of positive cases in our cohort may have restricted the statistical power of this analysis. Radiomics features demonstrate certain limitations in MRI-based studies, where feature stability can be affected by inter-scanner variability, differences in magnetic field strength, variations in slice thickness, and inconsistencies in imaging protocols. 10 To enhance the reliability of research findings, future multicenter studies should establish standardized image acquisition protocols and correlate them with histopathological results to validate the biological significance and clinical applicability of these radiomic features.
This study has several limitations that should be acknowledged. First, this study has several limitations regarding sample size. Although the primary cohort (n = 141) and external validation cohort (n = 40) meet the minimum requirements for radiomics model development, the lack of a priori sample size calculation - particularly given the low D2-40 positivity rate - may compromise statistical power, especially for subgroup analyses. Future prospective studies with larger cohorts are needed to validate the model's generalizability. Second, while this study adjusted for known clinically significant factors through multivariate regression, residual confounding may persist. Future studies should employ advanced methods such as instrumental variable analysis or propensity score matching to further mitigate selection bias. Third, this study exclusively focused on invasive breast cancer, excluding other breast cancer subtypes. Future research should expand to include more diverse histological types to enhance the generalizability of findings. Fourth, although D2-40 expression correlated with radiomics features, the biological mechanisms remain unclear, requiring molecular profiling to elucidate lymphatic marker-imaging phenotype relationships. Finally, the absence of long-term follow-up data limits the comprehensive evaluation of the predictive model's prognostic value.
Conclusion
In summary, the radiomics model based on DCE-MRI has demonstrated strong performance in the noninvasive prediction of occult SLN metastasis and D2-40 positive expression in breast cancer patients with ALN-negative MRI prior to surgery. The nomogram, which incorporates clinical indicators and radiomics features, exhibits enhanced performance and can serve as a noninvasive imaging biomarker for detecting occult lymph nodes. Furthermore, this study established a correlation between D2-40 expression and SLN status, as well as related radiomics features. Certain morphological and texture features identified through radiomics were found to be critical for predicting lymph node metastasis.
Footnotes
Abbreviations
Acknowledgments
We thank for all study participants who were enrolled in this study. We thank Radcloud Radiomics Platform for its help with research.
Ethics Statement
This retrospective study (conducted across the Center Campus and East Campus of Jinan Central Hospital) was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Ethics Committee of Jinan Central Hospital Affiliated to Shandong First Medical University (Approval No. 20250120011). As this committee provides ethical oversight for all campuses of the hospital, a single approval covers the entire study. The requirement for informed consent was waived due to the retrospective nature of the research.
Funding
This work was supported by Science and Technology Development Program of Jinan Municipal Health Commission (Grant No. 2023-2-29).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Materials
Supplemental materials are available from the corresponding author upon reasonable request.
