Abstract
Seroma is a common complication after mastectomy. To the best of our knowledge, no prediction models have been developed for this. Henceforth, medical records of total mastectomy patients were retrospectively reviewed. Data consisting of 120 subjects were divided into a training-validation data set (96 subjects) and a testing data set (24 subjects). Data was learned by using a 9-layer artificial neural network (ANN), and the model was validated using 10-fold cross-validation. The model performance was assessed by a confusion matrix in the validating data set. The receiver operating characteristic curve was constructed, and the area under the curve (AUC) was also calculated. Pathology type, presence of hypertension, presence of diabetes, receiving of neoadjuvant chemotherapy, body mass index, and axillary lymph node (LN) management (i.e., sentinel LN biopsy and axillary LN dissection) were selected as predictive factors in a model developed from the neural network algorithm. The model yielded an AUC of 0.760, which corresponded with a level of acceptable discrimination. Sensitivity, specificity, accuracy, and positive and negative predictive values were 100%, 52.9%, 66.7%, 46.7%, and 100%, respectively. Our model, which was developed from the ANN algorithm can predict seroma after total mastectomy with high sensitivity. Nevertheless, external validation is still needed to confirm the performance of this model.
Introduction
Breast cancer is a significant cause of illness in Thailand and around the world. In the United States, breast cancer is recognized as the most common cancer in females and is rated as the second leading cause of death after lung cancer [1]. According to the 2019 record, 271,270 people in the United States had been diagnosed with breast cancer, causing 42,260 deaths [2]. In Thailand, breast cancer incidence is rated as the first rank in female cancer [3]. Age-standardized incidence rate was 27.9 per 100,000 population per year approximately in 2012 and is expected to be 30.3 per 100,000 population per year in 2025 [4].
The majority of treatments for breast cancer consist of surgery, chemotherapy, hormone therapy, and radiotherapy [1]. In surgical techniques, mastectomy with or without axillary lymph node dissection (ALD) is still a frequently performed operation, especially in locally advanced tumors. After the mastectomy, complications can be found, including seroma, bleeding, hematoma, surgical site infection, lymphedema, and skin flap necrosis [5–9].

Example of artificial neural network structure.
Seroma is a common complication after breast surgery. The incidence of seroma was 15% to 81% [9–11]. Occurrence of seroma affects further treatments. For instance, chemo- or radiotherapy could be delayed. Seroma could delay wound healing, increase infection rate, and increase financial burden [11,12]. Many factors were identified as potential risks of seroma occurrence. Risk factors are divided into three groups: patient factors, disease factors, and operative factors.
Seroma is associated with aging [11–13], hypertension [9,14], obesity [12,15], and smoking [16]. The presence of malignant axillary lymph node, number of malignant nodes [14,17,18], and receiving of neoadjuvant chemotherapy [19] were identified as the critical disease-related factors. Risk factors related to surgery include axillary lymph node dissection [11,12,15], dissection with conventional electrocautery [11,20], prolonged operative time [15], and extensive chest wall dissection or increasing dead space from surgery [11,14].
Several researches studied seroma prevention [11,14,15] and many studies investigated using of quilting stitches [15,21], fibrin glue [22,23], or sclerotherapy [11,15] to obliterate dead space. Moreover, some studies claimed that using the ultrasonic scissors dissection can prevent seroma occurrence [15,24,25]. However, routine use of these strategies will increase the operative time and cost.
To the best of our knowledge, no prediction models are available for screening the patients, who would benefit from additional procedures targeting seroma prevention. Hence, we used our data to create a model to predict the occurrence of seroma using the artificial neural network (ANN) machine learning algorithm.
Demographic data and baseline characteristics
†Chi-square test, ∗Mann–Whitney test. ALD axillary lymph node dissection, BMI body mass index, CIS carcinoma in situ, DCIS ductal carcinoma in situ, IDC invasive ductal carcinoma, ILC invasive lobular carcinoma, LN lymph node, SLB sentinel lymph node biopsy.
Data on mastectomy patients from December 2016 to December 2019 were collected retrospectively. Variables, including age, body mass index (BMI), hypertension and diabetes status, pathology and pathological grade, lymphovascular invasion status, tumor and node stage, neoadjuvant chemotherapy status, axillary lymph node (LN) treatment, and the number of removed LNs, were gathered. Seroma was defined as fluid collection in the mastectomy area after drain removal.

Receiver operating characteristic curve of the artificial neural network model.
Variables were described as mean and standard deviation (SD) or frequency and percentage. These variables were compared between seroma and non-seroma cases using an independent
Data from 120 mastectomy patients were split into two groups of 96 (80%) and 24 (20%) patients. The model was trained and tested in the first group (96 patients), and validated in the second group (24 patients). For the preventive purpose, we selected only the pre- and intra-operative factors for our seroma’s prediction model. Therefore, by using this model surgeons can add special procedures during operation to avoid seroma occurrence.
ANN, or Multilayer Perceptron, in this case, is one type of supervised machine learning algorithm. It comprises of the input layer, hidden layers, and the output layer (see Fig. 1). In each layer, there are usually multiple units, called “nodes”. The number of nodes and layers are different among algorithms and tasks. The data are input to the input layer, then passed to the hidden layers. These layers then try to capture the important features from the data and later output them to the output layer. The exported data are then compared to the ground truth. By using the back-propagation algorithm, the network can adjust the weight of the model’s parameters. The goal is to reduce error distance from the ground truth in order to not lead to overfitting the training data.
Data was trained by the 9-layer neural network. The activation function was a hyperbolic tangent, with the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization algorithm, 𝛼 10−5 for regularization, and differential learning rate. Then, validation was performed by a 10-fold cross-validation method.
Model performance evaluation
The validating data set, including 24 patients, was used for performance evaluation. The receiver operating characteristic (ROC) curve was plotted, and the area under the curve (AUC) was calculated. Sensitivity, specificity, and positive and negative predictive values (PPV and NPV) were also reported. All steps of model development and validation were conducted using Scikit-learn 0.22.1.
Results
Seroma was observed in 33 patients. Only hypertension status was significantly different between seroma and non-seroma groups (72.3% versus 47.1%, respectively;
Pathology, hypertension, diabetes, neoadjuvant status, and BMI were selected as predictive factors. An axillary LN treatment was also included in the model because a higher risk of seroma after axillary LN dissection was observed in many previous studies [11,12,15]. AUC (95%CI) of our ANN model was 0.760 (0.676, 0.844) in the validating data set (see Fig. 2). Sensitivity, specificity, accuracy, PPV, and NPV were 100%, 52.9%, 66.7%, 46.7%, and 100%, respectively.
Discussion
Seroma is a common complication after mastectomy and can prolong drainage. Many surgical techniques have been used to avoid this problem, although no predictive models have been developed for screening the patients who risk seroma occurrence. To the best of our knowledge, the present model, which was derived from the ANN algorithm, is the first model for this task. This machine learning model yielded an AUC (95%CI) of 0.760 (0.676, 0.844). A high sensitivity (100%) made this ANN model suitable for screening purposes.
Nowadays, several medical studies apply machine learning in their researches. For instance, the ANN algorithm was better in the diagnosis of anterior, and inferior myocardial infarction than the conventional electrocardiographic criteria in the study of Hedén et al. [26]. ANN has been used to predict the intensive care unit outcome including the length of hospital stay of trauma patients in one study and satisfactory model performance was achieved [27]. The advantage of ANN is that it can overcome multicollinearity; thus, more factors can be included in the ANN model. Furthermore, the non-linear problem and correlation of factors can be solved by the algorithm [28].
Nonetheless, some limitations in the present model were recognized. This model was derived from a small data set, which could affect the model’s accuracy and caused the straight-line appearance of the ROC curve. Only pre- and intra-operative predictive factors were used in this model expecting that additional intra-operative procedures could be guided by the model; although adding more factors (e.g., patient’s age, number of nodes removed, extra-nodal spread, or size of breast) may improve predictive accuracy. Additionally, exposure effects (i.e., odds ratio or relative risk) are not provided by ANN. Without a clear picture, some physicians would be reluctant to use the ANN model in their practice.
In conclusion, this study presents the ANN model, which aims to screen patients who might benefit from additional seroma prevention techniques. The performance of the model is satisfactory. Please see https://seroma.herokuapp.com/ for the web application of the model. The impact of model implementation should be explored in future studies.
Footnotes
Acknowledgements
This manuscript was proofread and edited by Nattakrit Tongpoonsakdi, and a comprehensive English language review was conducted by Stephen Pinder, a medical English specialist.
Statement of ethics
This study involved retrospective data collection and had been approved by the ethics committee before data retrieval (Registration number 070/62).
Conflict of interest
All authors declare no conflict of interest.
Funding source
None.
Author contributions
Data were collected by Panupong Nakchuai and validated by Pakkapol Sukhvibul. Basic statistical analysis was performed by Amarit Tansawet. Artificial neural network model development and model deployment were performed by Sermkiat Lolak. The manuscript was drafted by Panupong Nakchuai and Pakkapol Sukhvibul. Amarit Tansawet designed the study under Suphakarn Techapongsatorn’s supervision.
