Abstract
Breast cancer is one of the most fatal diseases leading to the death of several women across the world. But early diagnosis of breast cancer can help to reduce the mortality rate. So an efficient multi-task learning approach is proposed in this work for the automatic segmentation and classification of breast tumors from ultrasound images. The proposed learning approach consists of an encoder, decoder, and bridge blocks for segmentation and a dense branch for the classification of tumors. For efficient classification, multi-scale features from different levels of the network are used. Experimental results show that the proposed approach is able to enhance the accuracy and recall of segmentation by
Introduction
Breast cancer is the deadliest type of cancer that is found in women and is the leading cause of death in young women across the world. According to a study performed by the International Agency for Research on Cancer in 2018, 1 approximately 2.1 million new breast cancer cases and 0.6 million new deaths were reported worldwide. In 2020, the number of breast cancer cases exceeded lung cancer, making it more prevalent than other forms of cancer. 2 Earlier diagnosis and treatment of breast cancer can reduce the mortality rate, increase the survival chances, and quality of life of the patient.3 –5 Digital mammography is the primary imaging examination for the diagnosis of breast cancer. But exposure to ionizing radiation multiple times can increase the risk for breast cancer. 6
Ultrasound Imaging (UI) is an alternative procedure for mammography for the diagnosis of breast cancer. UI is more safer, faster, cheaper, and reproducible compared to that of digital mammography.7 –9 UI has reported higher detection rate of breast cancer in women with dense breasts than mammography. 10 But due to the high sensitivity of the ultrasound instruments, these UI images are susceptible to the influence of the environment and other tissues of the human body, resulting in large amounts of speckle noise which makes the diagnosis procedure difficult for medical professionals. Thus, efficient computerized systems for assisting medical professionals in diagnosing breast cancer, specifying surgical plans, and treatment is very essential. But designing such systems for UI based diagnosis is a complex and a challenging task because of non-uniform tumor boundary, variations in tumor size and shape, and low signal-to-noise ratio in ultrasound images.
Classification and segmentation are the two primary tasks of a computerized system in medical imaging. Malignant and benign tumors have different shape characteristics. For example, malignant tumors have speculated and irregular shapes, and benign tumors have oval, round, and smooth boundaries. 11 These shape characteristics are used in clinical diagnosis by radiologists in categorizing the tumor as benign and malignant. These properties are very useful for tumor classification and segmentation. Hence training a single network to solve two tasks (segmentation and classification of breast tumor) through feature sharing is a promising direction to explore.
In this paper, a novel multi-task learning approach is proposed to jointly perform the segmentation and classification of breast tumors in ultrasound images in an end to end model. The proposed model consists of an encoder-decoder network for segmentation and a multi-scale feature network branch for classification of a tumor. In this work, the U-Net architecture is modified by adding an residual module for building the encoder-decoder network. The classification and segmentation tasks share the common features extracted by the encoding block. To overcome the problem of varying tumor characteristics, we use multi-scale features extracted from different convolutional layers from the encoder block for tumor classification. The rest of the paper is organized as follows, Section 2 describes the existing literature, Section 3 describes the materials and methods used, Section 4 describes the results reported, and Section 5 concludes the proposed work.
Related Works
Ultrasound Image Segmentation
In previous studies, there were several methods proposed for the segmentation of ultrasound images using conventional image-processing techniques like active contour,12,13 region growing,14,15 and watershed transform.16 –18 Gomez et al. 18 proposed a robust method to segment various objects in contrast enhanced ultrasound images with closed contours using marker controlled watershed transformations. Kozegar et al. 15 proposed a two stage segmentation method, where the mass boundary is estimated using an adaptive region growing algorithm in the first stage. In the second stage, this estimation is used as an initial contour by a geometric edge based model for further refinement. The performance of this model is heavily dependent on the initial seed selection.
Over the years, there were many Convolutional Neural Network (CNN) based models designed for ultrasound image segmentation.19 –22 Xing et al. 19 developed a semi-pixel-wise cycle model using Generative Adversarial Network (GAN) and CNN for tumor segmentation. The anatomy based image segmentation is also very important in ultrasound image analysis to reduce the false positives in image segmentation. Lei et al. 23 used boundary regularization based encoder-decoder network to segment the breast anatomy in ultrasound images. Authors in Lei et al. 24 used self-co-attention mechanism to further improve the segmentation results of breast anatomy. Kumar et al. 25 designed a method named Multi-UNet based on the popular U-Net architecture for the segmentation of masses from ultrasound images. In study, 26 custom designed attention blocks were added to the existing U-Net for tumor segmentation. In this work, the salient feature maps are integrated with the deep learning attention UNet for better segmentation.
Ultrasound Image Classification
Traditional methods for ultrasound image classification rely on manual extraction of posterior acoustic features, echo patterns, lesion boundary, margin, orientation, shape, and texture features. Moon et al. 27 used texture, descriptor, and morphological features for the classification of the tumor as malignant or benign. Gómez Flores et al. 28 enhanced the diagnostic accuracy of tumor in ultrasound imaging by analyzing the distinct morphological and textual features. In Gomez et al. 29 authors used watershed transformation technique to segment the tumor area in the ultrasound images. From these segmented images 22 morphological features were computed and minimum-redundancy-maximal-relevance-criteria was used to rank these features. Later in the study, n-dimensional feature subsets have been created using ranked feature space and fisher discriminant analysis classifier. Uniyal et al. 30 proposed a novel approach, where tumor malignancy maps are generated based on an estimation of cancer likelihood from ultrasound radio-frequency time series for the classification of malignant tumors. The performance of the above discussed methods is largely dependent on the accuracy of the manually extracted features.
Deep learning based methods can overcome the limitations of traditional methods with their powerful feature extraction capability. Zeimarin et al. 31 reported enhanced performance in tumor classification while using a custom CNN model with regularization technique. More than the custom-built CNN, several architectures which are trained on large dataset, can easily be fine tuned for classification of tumors. This methodology is known as transfer learning. Transfer learning methods have shown satisfactory improvements in performance for the classification of breast tumors. 32 In study, 33 the performance of transfer learning of pre-trained models namely SD300 + ZFNet, YOLO, and VGG16 was analyzed, out of which SD300 + ZFNet reported higher performance than the other two models. Author in Hijab et al. 34 proposed a transfer learning model, which was built by fine-tuning the pre-trained model VGG16. To overcome the problem of overfitting, image augmentation techniques were been used, and the model reported to have an accuracy of 97%. Research 35 proposed a transfer learning model that uses the VGG19 pre-trained model as the base model. In this work, the ultrasound images are converted to RGB representation before training the model. Study 36 proposed a transfer learning model using pre-trained InceptionV3 for the classification of breast lesions. In study, 37 the authors performed a comparative analysis of performance reported by the transfer learning of InceptionV3, Xception, and ResNet50 to classify tumors from ultrasound datasets. In Han et al., 38 a transfer learning approach based on GoogleNet is proposed to diagnose breast cancer from ultrasound images. In this work, histogram equalization and image cropping methods are used to augment the training images. Tanaka et al. 39 employed the ensemble of VGG19 and ResNet152 for the diagnosis of malignant and benign tumors.
Joint segmentation and classification (Multi-task learning)
Multi-task learning is an approach that focuses on solving two or more different tasks parallelly at the same time. In medical imaging, multi-task learning is applied to perform segmentation and classification tasks parallelly on the same image. For ultrasound images, Wang et al. 40 modified the structure of U-Net by adding a classification branch for classification and segmentation of bone surfaces. Xie et al. 41 proposed a dual stage multi-task approach using pre-trained models for the segmentation and classification of tumors from breast ultrasound images. In this approach, ResNet is used for the extraction and the classification of candidate regions in the first stage and in the second stage a modified Mask R-CNN is used for tumor segmentation.
Materials and Methods
In our work, the residual learning approach was used to ease the training of the deep neural networks and to overcome the problem of degradation. 42 The architecture of the proposed residual module is different from Diakogiannis et al. 43 , Zhang et al. 44 in terms of number of layers. this proposed work used three convolutions, three batch-normalization, three ReLU, and one concatenation layer. The residual network consists of a stack of residual units, where each unit consists of a convolutional layer, batch normalization layer, and ReLU activation. Figure 1 shows the difference between the neural layers module in standard UNet and our proposed residual module.

Difference between the neural layers module in standard UNet and the proposed residual module in the this work: (a) Neural layers used in U-Net and (b) proposed residual module.
As shown in Figure 1(b), shortcut connections or skip connections are the ones that skip one or more layers in the neural network. The proposed residual unit shown in Figure 1(b) can be represented by equation (1).
In equation (1),
Residual-U-Net
The proposed multi-task learning approach for segmentation and classification is shown in Figure 2. The proposed network is built by combining the strengths of residual and U-Net architectures. This network combination has two advantages: (1) The residual unit eases the learning of the network. (2) The skip connection between the high and low levels in the residual unit will help for the propagation of information without degradation. The proposed multi-task learning approach can perform segmentation and classification simultaneously.

The proposed multi-task learning approach for segmentation and classification.
The Residual-U-Net proposed in this work is a 9 level architecture consisting of 3 parts namely encoder, decoder, and a bridge. The encoder converts the input image into compact representation and the decoder recovers this representation into pixel wise classification. Bridge acts as a connection between encoder and decoder parts. All these parts are built using residual units shown in Figure 1(b), each block consists of
For classifying the tumor as benign or malignant, the feature map from the last layer of the encoder block is taken and passed through a single dense layer for classification. 40 Whereas in our classification branch, the extracted features from the last block of encoder, bridge, and the first block of decoder is used for classification (Figure 2). The branch extracts and concatenates the feature maps from stages 4, 5, and 6. Since these features have dimensions that cannot be concatenated directly, Global Average Pooling (GAP) is connected to the end of each block to make them suitable for concatenation. The GAP layers are employed to enable multi-scale feature concatenation. Even fully connected (FC) layers and Global Max Pooling (GMP) layers perform the same operation, but they have a few disadvantages. FC layers increase the training time and the number of parameters. Whereas the GMP uses max-vocel to represent the whole features and neglects a lot of useful spatial information.
Then these features are concatenated and are then passed to a series of dense, dropout layers for classification. The first dense layer that receives the fused features consists of 256 units and is activated with ReLU function and the last dense layer contains 2 units and is activated with softmax function to predict the class of the input ultrasound image as malignant or benign. In comparison to Wang et al., 40 the proposed classification branch has two dense layers. Between these two dense layers, a dropout layer is added with dropout rate to 0.5 to prevent overfitting and this layer acts as regularization to the network.
Loss Function
During segmentation tasks, the variance between the background and foreground may result in segmentation bias. To solve this issue, a segmentation loss based on the Dice coefficient is used in this work. This was defined as
In the above equation,
In medical imaging, class imbalancing is the most challenging problem, making the model more biased to one class if it was not resolved. In this work, the considered BUSI datasets consist of more malignant instances than benign, this may cause the model to have more bias to malignant cases. To deal with this imbalancing problem, the weighted focal loss 45 is used for the classification task and is shown in equation (3).
In the above equation,
In equation (4),
Dataset
The proposed multi-task model is evaluated using the benchmark ultrasound dataset BUSI. 48 This dataset consists of a total of 780 2D breast ultrasound images collected from 600 women aging between 25 and 75. Among the 780 images, 133 are benign masses, 437 are normal cases, and 210 are malignant masses. Even though these images look like one-channeled grayscale images, but they are three-channeled images. BUSI also contains corresponding masks for all the images. The main idea behind tumor segmentation besides solo classification is to track the tumor changes and to assess the seriousness of the tumor. Since normal ultrasound images don’t have any tumor mass for segmentation, they are not used in this work.
Experimental Setting
To show a comparison between our proposed networks and the existing networks, the same experimental set-up is followed, and only benign and malignant cases are considered for training. The images are enhanced using the Generated Histogram Equalization method before feeding them into the network. It is done to increase the intensity difference between the tumor and the background. Data augmentation is done, and the images are augmented using horizontal flip, rotation by 30°, and vertical flip. Initially, the hyper-parameter values are kept same as reported in the existing work.45 –47 However, the hyper-parameters tuning is done to yield the optimum performance based on extensive evaluation. TensorFlow is used as the deep learning framework. All the experiments in this work are carried out in Google Colab. The proposed model is compiled using Adam optimizer and the proposed loss function, for validation the standard 5-fold cross validation protocol is considered. For each fold, the model is trained for 500 epochs with batch size of 16 and the learning rate is set to 0.0001 which further decreases by one tenth after every 20 epochs. Before feeding into the network, the images are enhanced by Multi peak Generated Histogram Equalization (GHE) method. 49 This method increases the intensity difference between the tumor and the background. During training, the training images are augmented by flipping horizontally, flipping vertically, and rotating by 30°. The transfer learning models are pre-trained on ImageNet and to make them suitable for our work, they were fine-tuned with the BUSI dataset.
Evaluation Metrics
In this work, eight popular metrics namely Jaccard Similarity Index (JSI), Dice Coefficient (DC), accuracy (ACC), Precision (PRE), Recall (REC), specificity (SPE), F1-score (F1), and Area under ROC curve (AUC) are used for the quantitative evaluation of the proposed model. These metrics are explained below.
In equations (6) and (7),
Results
The results reported by the proposed multi-task learning method for segmentation and classification during each fold are shown in Tables 1 and 2.
Segmentation Results Reported by the Proposed Model.
Classification Results Reported by the Proposed Model.
Comparison With Current STATE-of-the-ART Methods
The performance of the proposed model is compared with other existing methods50 –54 for tumor segmentation and classification. For a fair comparison, all these methods are downloaded from their public implementations and are retrained using the BUSI dataset. During retraining, optimum hyper-parameters are chosen based on extensive experiments. These experiments were conducted by varying the batch size, optimizer and the learning rate. The hyper-parameters for which the existing methods reported optimum performance is shown in Table 3.
Hyper-Parameters for Which the Existing Segmentation Methods Reported Optimum Performance.
For segmentation, we compare our work with pre-trained DeeplabV3+, 50 UNet++, 51 UNet, 52 feature pyramid network (FPN), 53 and also with attention based methods like DAF. 54 This comparison is shown in Table 4.
Comparison With Segmentation Models From the Literature.
The segmentation performance reported by the proposed model and the other segmentation methods on sample images is shown in Figure 3. When normal images are passed to the network, the segmentation branch outputs the entire image without segmenting any region.

Comparison of segmentation performance reported by the proposed model with other segmentation methods.
For classification, the proposed model is compared with traditional feature based classification methods like, 27 and with transfer learning methods like.30,34 For a fair comparison, these existing methods are retrained using the same data that is used by the proposed model. During retraining, the hyper-parameters were changed for the transfer learning methods30,34 for getting the optimum performance on the BUSI dataset. The hyper-parameters used for these methods are shown in Table 5.
Hyper-Parameters for Which the Existing Classification Methods Reported Optimum Performance.
These parameters were chosen by performing extensive experiments by changing the optimizer, batch size, and the learning rate. This comparison is shown in Table 6.
Comparison With Classification Models From the Literature.
The proposed model outperformed the models proposed in the literature.
Along with the comparison with the existing literature, the classification performance of the proposed model is also compared with existing transfer learning models namely ResNet50, VGG19, DenseNet201, InceptionV3, MobileNet, and InceptionResNetV2. For this work, the number of units in the last dense layer of these models is changed (n_units = 2) to make these models suitable for tumor classification. This comparison is shown in Table 7.
Comparison of Classification Results With Pre-Trained Models.
When the three classes namely Benign, Malignant, and Normal are considered the proposed model reported an classification accuracy of 97.5%, precision of 99%, recall of 98.12%, specificity of 93.59%, F1-score of 99.32%, and AUC of 0.983.
Discussion
Ultrasound imaging is extensively used for the diagnosis of breast cancer as it is safe, fast, and has better reproducibility.7 –9 But those images are susceptible to speckle noise and make the diagnosis difficult for medical professionals. So, designing an efficient system for the UI is an important task. The varying tumor characteristics like shape, size, unclear tumor boundaries, and less signal to noise ratio makes the tumor segmentation and classification task a more challenging one. In this paper, a multi-task learning approach is proposed for the efficient segmentation and classification of breast tumors from ultrasound images.
In this work, the features from multiple layers of the proposed Residual-U-Net are concatenated using GAP and used for the classification of tumors. This multi scale feature concatenation helps the classification method to overcome the problems with varying tumor characteristics and compromising UI environments. As shown in Table 6, the proposed classification method reported higher performance than single classification models. An example of a complex benign and malignant tumor is shown in the last row Figure 3, which was misclassified by methods based on UNet, 52 UNet++, 51 and DeepLabV3, 50 but correctly segmented and classified by the proposed model.
The proposed Residual-U-Net reported better segmentation performance than single segmentation models, as shown in Table 4. Besides accuracy, recall is also the most crucial metric in medical informatics, as it explains the number of correctly classified/segmented malignant tumors in this work. Among the existing related research works, methods proposed in Hijab et al.
34
, Wang et al.
54
reported higher performance in terms of accuracy and recall on segmentation and classification tasks. But the proposed multi-task approach is able to enhance the recall and accuracy of the segmentation and classification tasks by
As of now, the current limitation of the reported work is an inclusion of the Breast Imaging Reporting and Data System (BIRADS) 55 information. BIRADS is a standardized system of reporting breast cancer risk and is mainly used in mammogram, breast ultrasound, and breast magnetic resonance imaging (MRI) reports. BIRADS is used to place abnormal finding into categories that ranges between 0 and 6 (Table 8). A BIRADS score is a part of the breast imaging reports and helps in quantifying how concerning the finding is. A higher number indicates a higher risk. A change in BIRADS score from test to test helps in clearly detecting a difference between the results.
BIRADS Classification Categories.
The literature indicates that it is relatively easy to classify BIRADS 3 and 5, whereas BIRADS 4 is difficult to classify. As a result, the lack of BIRADS 4 lesions in the used dataset can skew the result more favorably. Since the dataset used in this study is not broken into BIRADS categories, there can be chances of a skewed results. Other studies using BUSI dataset have got their images annotated manually for the BIRADS descriptors and categories with the help of a radiologist. We don’t have the resources to do the BIRADS classification as if now. In future study, we will include the information and additionally test our proposed methodology on other datasets that have BIRADS classification.
Conclusion
In this paper, a multi-task learning approach is proposed for the segmentation and classification of tumors in breast ultrasound images. In this work, the multi-task model is built by modifying the U-Net architecture by using residual units and by adding a classification branch to the network. Multi-scale features are extracted from different layers of the proposed Residual-U-Net for efficient classification of the tumor. Experimental results show that the proposed model is more efficient than the existing segmentation and classification methods in terms of accuracy and recall. This model enhanced the recall of breast image segmentation and classification tasks by
In future work, we will study the different networks to extract more discriminative information and will evaluate the proposed model on a larger dataset with more complex samples for understanding the robustness of the model.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
