Abstract
Objective
Few studies have systematically developed predictive models for clinical evaluation of the malignancy risk of solid breast nodules. We performed a retrospective review of female patients who underwent breast surgery or puncture, aiming to establish a predictive model for evaluating the clinical malignancy risk of solid breast nodules.
Method
Multivariable logistic regression was used to identify independent variables and establish a predictive model based on a model group (207 nodules). The regression model was further validated using a validation group (112 nodules).
Results
We identified six independent risk factors (X3, boundary; X4, margin; X6, resistive index; X7, S/L ratio; X9, increase of maximum sectional area; and X14, microcalcification) using multivariate analysis. The combined predictive formula for our model was: Z=−5.937 + 1.435X3 + 1.820X4 + 1.760X6 + 2.312X7 + 3.018X9 + 2.494X14. The accuracy, sensitivity, specificity, missed diagnosis rate, misdiagnosis rate, negative likelihood ratio, and positive likelihood ratio of the model were 88.39%, 90.00%, 87.80%, 10.00%, 12.20%, 7.38, and 0.11, respectively.
Conclusion
This predictive model is simple, practical, and effective for evaluation of the malignancy risk of solid breast nodules in clinical settings.
Introduction
The incidence of breast nodules is high worldwide, but optimal strategies for clinical assessment and treatment remain controversial. 1 The consensus of many studies has been that 20% to 30% of breast nodules will ultimately develop into cancer. 2 Early diagnosis and intervention is crucial for improving the prognosis of patients with breast cancer. Therefore, differentiating malignant nodules from benign lesions is of critical importance.
Only 1% to 2% of detected nodules are determined to be cancerous. However, the discovery of breast nodules often causes patients to worry because clinical evaluation of malignancy without pathological assessment is uncertain, and performing biopsies on some nodules can be difficult. 3 A bulletin published by the American College of Obstetricians and Gynecologists on Diagnosis and Management of Benign Breast Disorders 4 suggests that biopsy should be performed for women aged >30 years with low-risk nodules (Breast Imaging Reporting and Database System [BI-RADS] score 1–3) if cancerous properties are suspected based on imaging examination. However, BI-RADS grading of some nodules can be ambiguous.
Ultrasound is the most widely used imaging examination in the diagnosis of breast diseases. 5 The rate of early identification has markedly improved 6 with the application of novel ultrasonic techniques including contrast-enhanced ultrasound7,8 and ultrasound elastography.9,10 Irregular spiculate or foliar margins, S/L ratio > 1, hypoechoic mass, microcalcification, and arterial flow with resistive index (RI) > 0.7 are considered typical ultrasonic manifestations associated with malignant nodules.11–13 Mammography is an effective and clinically recognised method for early detection of microcalcification. 14 Magnetic resonance imaging (MRI) can detect microvessels around early lesions. Mortality rates associated with breast cancer have been significantly reduced by early identification of malignant nodules using these imaging techniques.15,16 However, each technique has limitations and may not be accurate when applied independently. Thus, a combination of these technologies may improve diagnostic accuracy.
Studies17,18 have pointed out that age should be considered a risk factor for the occurrence of breast cancer because patients over 40 years old have higher incidence. Currently, cancer antigen 15-3 (CA153) produced by breast cancer cells is the most specific biomarker.19,20 Carcinoembryonic antigen (CEA), another specific marker, has been implicated as a diagnostic and prognostic indicator in breast cancer for nearly 30 years.19–21 Additionally, family history is also an important risk factor.22,23 Thus, multidisciplinary analysis to evaluate malignancy risk of breast nodules can increase the accuracy of diagnosis.
In the present study, we established a logistic regression model based on multidisciplinary analysis of data from imaging, serological, and clinical diagnostic techniques to predict the malignancy risk of breast nodules.
Methods
Patients
This was a retrospective study based on clinical data obtained from female patients aged 20 to 78 years admitted to Zhejiang Provincial People’s Hospital with solid breast nodules from January 2016 to June 2019. Ethical approval for this study was obtained from the Institutional Review Board of Zhejiang Provincial People’s Hospital. As a retrospective study of anonymised imaging and clinical data, the requirement for informed consent was waived.
The inclusion criteria were as follows: 1) solid breast nodules were detected via ultrasound examination in our department using a GE Logiq E9 instrument (GE, Boston, MA, USA); 2) clinical information (ultrasonic, serological, clinical and mammography data) was complete; 3) patients underwent breast surgery or puncture in our hospital; and 4) pathological results were available. The exclusion criteria were: 1) nodules with unclear ultrasonic images or incomplete patient information; and 2) nodules without pathological results.
Variable Acquisition
The variables in the model were grouped as shown in Table 1. Ultrasonic variables were grouped as follows: a) nodule size (maximum diameter); b) internal echo (hyperecho, isoecho, or hypoecho); c) boundary (well-defined or poorly-defined); d) margin (regular or irregular); e) colour Doppler flow imaging (without blood flow, dotted blood flow, or abundant blood flow); f) RI >0.7 or <0.7; g) S/L ratio >1.0 or <1.0; h) elastography score (1–5); and i) increase or no increase in the maximum sectional area of the nodule using contrast-enhanced ultrasound (Figure 1). Serological variables included levels of the tumour markers CA153 and CEA (µg/L). Clinical variables included age (>40 years or <40 years) and family history of breast cancer (present or absent). Mammography variables included microcalcification features based on molybdenum target mammography (present or absent) (Figure 2).
Variable name and assignment of values.
CA153, cancer antigen 15-3; CDFI, colour Doppler flow imaging; CEA, carcinoembryonic antigen; RI, resistive index.

Contrast-enhanced ultrasonography of a solid breast nodule. (a) Using two-dimensional ultrasonography, the size of the nodule was 12.2 mm by 5.7 mm. (b) Using contrast-enhanced ultrasonography, the size of the nodule was 14.9 mm by 6.6 mm, and thus the maximum sectional area increased.

Microcalcification of a solid breast nodule. (a) Microcalcifications shown by ultrasonography. (b) Microcalcifications shown by mammography.
Variable Assignment
Table 1 shows the method of variable assignment. Pathological diagnosis (benign or malignant) of solid breast nodules was used as the dependent variable, and the above 13 multidisciplinary variables were used as independent variables. All variables except nodule size, RI, L/R ratio, CA153, and CEA were categorical variables.
To reduce bias, the X2, X3, X4, X5 and X8 ultrasound variables were assessed independently by two physicians with more than 5 years of experience in ultrasound diagnosis of breast diseases. Kappa values were calculated and variables with poor consistency were eliminated.
Data Processing and Statistical Analysis
To estimate malignancy risk, 207 nodules were randomly selected from the 319 nodules and were defined as the model group. The remaining 112 nodules were defined as the validation group. The model was evaluated based on various performance measures (accuracy, sensitivity, specificity, Jorden index, missed diagnosis rate, negative likelihood ratio, and positive likelihood ratio).
To reduce bias, two ultrasound physicians with more than 5 years of experience performing thyroid ultrasounds measured ultrasonic variables. Kappa analysis was performed on subjective variables (X2, X3, X4, X5, and X8). Variables with poor consistency were eliminated. Variables in the benign and malignant groups were first investigated by univariate analysis. Variables showing no significant difference were excluded, and logistic regression analysis was performed using the remaining variables.
To establish the predictive model, multivariable logistic regression was performed on the 207 nodules in the model group. The odds ratio (OR) of each variable was calculated using the formula Z = Logit(P) = ln[P/(1−P)] = β0 + β1X1 + β2X2 + ··· + β14X14, where OR = eβ; β is the regression coefficient. The predictive probability of malignancy was determined using
To validate the predictive model, values of regression parameters were estimated using likelihood ratio tests. The goodness of fit of the entire model was evaluated using the Hosmer–Lemeshow (H-L) test. Predictions of malignant and benign solid breast nodules were made using thresholds of
To further assess the diagnostic value of the model, diagnostic efficiency of each variable was evaluated via receiver operating characteristics curve (ROC) analysis. The model was compared with the gold standard, pathological assessment.
Results
Patients
A total of 319 female patients aged 20 to 78 years (mean ± standard deviation [SD]: 41.46±12.967 years) admitted to Zhejiang Provincial People’s Hospital with solid breast nodules from January 2016 to June 2019 were enrolled in the study. Among 319 nodules with lengths ranging between 3.5 and 84.0 mm (mean ± SD: 19.50 ± 14.77 mm), 232 and 87 nodules were determined as benign and malignant, respectively, based on further pathological examinations.
Kappa Analysis
Table 2 shows the Kappa values for the X2, X3, X4, X5, and X8 ultrasound variables. All were ≥0.75, indicating high consistency.
Consistency analysis of subjective variables.
*Kappa values >0.75 were considered to indicate substantial consistency, values of 0.40–0.75 indicated moderate consistency, and values <0.40 indicated poor consistency. For the identity of variables and variable name assignment, refer to Table 1.
Univariate Analysis (Independent Sample T-Test and Chi-Square Test)
Table 3 shows the univariate statistical analysis of all 13 variables. The X1, X2 and X8 variables did not reach statistical significance and were removed. We analysed the remaining 10 variables using multivariable logistic regression to further identify variables of clinical significance.
Univariate statistical analysis.
Data are presented as n (%) or mean ± standard deviation.
Establishment of Predictive Model
Table 4 shows the independent variables (including X3, X4, X6, X7, X9, and X14) that were ultimately included in the model. The combined predictive formula of the model was as follows: Z = −5.937 + 1.435X3 + 1.820X4 + 1.760X6 + 2.312X7 + 3.018X9 + 2.494X14;
Results of logistic regression.
*B is the coefficient
§Exp(B) is the odds ratio of the corresponding variable.
df, degrees of freedom; SE, standard error. For the identity of variables and variable name assignment, refer to Table 1.
Validation of Predictive Model
The H-L test yielded a
Evaluation of Diagnostic Efficiency
The results of ROC curve analysis of the model and each selected variable are summarised in Figure 3. The area under the curve values for the X3, X4, X5, X6, X7, and X13 variables and for the whole model were 0.782, 0.692, 0.636, 0.827, 0.825, 0.845, and 0.954, respectively. These datas indicated that the diagnostic accuracy of the overall model was higher than that of each single variable.

Receiver operator curve analysis of the whole predictive model the selected model variables. Pre-1 represents the whole model. The areas under the curve for X3, X4, X6, X7, X9, X14, and the whole model were 0.782, 0.692, 0.827, 0.825, 0.832, 0.845, and 0.971, respectively. For the identity of variables and variable name assignment, refer to Table 1.
Discussion
The morbidity and mortality of breast cancer has increased worldwide in recent years. 24 Breast cancer is one of the most frequent malignant tumours in women. Therefore, early detection and diagnosis is crucial for effective treatment and satisfactory prognosis. Previous studies investigating the early identification and diagnosis of breast cancer25–27 did not assess multiple variables. The clinical significance of comprehensive interdisciplinary and multivariate diagnostic criteria for differential diagnosis of benign and malignant breast nodules has not been widely investigated.
In the present study, differences among multiple variables were analysed using logistic regression. According to previous studies,19–21 ultrasound, mammography, CA153 and CEA, age, and family history are factors relevant for differential diagnosis. The objectivity and accuracy of prediction improved when these parameters were included. During the establishment of the logistic regression model, variables without predictive value were excluded to improve stability of the model. Screening of successive regression models identified the following significant predictors of malignancy: unclear boundary, irregular shape, RI >0.7, L/R >1, increased maximum section area of nodules after contrast-enhanced ultrasound (relative to two-dimensional ultrasound), family history, and microcalcification identified via mammography. All regression coefficients were >0 and
ORs for all variables in the model were compared and the importance of previously identified indicators for risk assessment was assessed. Variable X14, microcalcification, had the highest OR, representing higher risk assessment value compared with other variables. As shown by goodness-of-fit tests, our predictive model exhibited high accuracy and a good fitting effect. These data demonstrate that comprehensive assessment of multidisciplinary variables is more effective for assessment of malignancy risk in breast nodules compared with analysis of single variables.
At present, needle biopsy is the preferred method for differential diagnosis of breast nodules before surgery.28,29 However, invasive biopsy may cause complications such as bleeding, infection, and nerve damage, particularly in elderly or sick patients.
Our regression model could identify predictive indicators associated with malignancy risk of solid breast nodules, thereby improving the accuracy of differential diagnosis. The model may assist physicians in making more accurate diagnoses, avoid unnecessary needle biopsy, assist clinicians in formulating better treatment decisions, and guide postoperative follow-up.
Our study had several limitations. First, several eliminated variables may still be closely related to the occurrence and development of breast cancer. Second, to integrate patient information and to maintain a large sample size, we had to remove microvessels around early lesions in MRI scans, serum ferritin, procalcitonin, and other features potentially associated with malignancy. Last, as a retrospective analysis, the reliability of our conclusions depended on the sample size. In future work, we plan to further enlarge the sample size to verify the clinical applicability of this model.
Conclusion
A predictive model for evaluation of the malignancy risk of solid breast nodules was established. Based on multivariable logistic regression analysis, the model included multiple indicators of malignant nodules. This model may provide social and economic benefits via improved accuracy of diagnosis.
Footnotes
Acknowledgements
A warm thanks to all participating patients and investigators of the study.
Declaration of conflicting interest
The authors declare that there is no conflict of interest.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
