Sage Journals: Discover world-class research

Abstract

Objectives

The primary goal is to address the challenges in brain tumor segmentation (BraTS), such as limited accuracy and high computational costs, by developing a more precise and efficient segmentation technique. The study aims to improve the diagnosis and treatment planning of brain tumors by enabling clinicians to accurately localize and assess tumor regions from multimodal magnetic resonance imaging (MRI) scans.

Methods

We proposed self-attention U-Net (SAU-Net), a novel model that integrated self-attention mechanisms with the U-Net convolutional architecture. This design allowed the model to preserve spatial context while selectively concentrating on pertinent features, thereby enhancing tumor (ET) boundary delineation and overall segmentation accuracy. Extensive experiments were conducted on the BraTS 2018 and BraTS 2020 datasets using thorough cross-validation and testing protocols. The performance of SAU-Net was evaluated and compared against other attention-based U-Net models, including adaptive attention U-Net, multi-head attention U-Net, and group query attention U-Net.

Results

On the BraTS 2018 dataset, SAU-Net achieved Dice scores of 98.16% (whole tumor (WT)), 98.87% (tumor core (TC)), and 98.23% (ET), with an average Dice score of 98.23%. For the BraTS 2020 dataset, the model recorded Dice scores of 98.99% (WT), 98.70% (TC), and 99.18% (ET), with an average Dice score of 98.62%. In addition to superior segmentation performance, the model demonstrated reduced computational complexity in both training and prediction times, along with optimized memory usage.

Conclusion

SAU-Net is a highly effective and computationally efficient model for BraTS. Its superior performance, as evidenced by the high Dice scores on two benchmark datasets, combined with its reduced computational requirements, underscores its potential for practical and impactful clinical applications.

Keywords

Brain tumor segmentation deep learning self-attention mechanism U-Net multimodal magnetic resonance imaging

Introduction

Brain tumor segmentation (BraTS) is crucial in medical image analysis, as it provides vital information for the identification, monitoring, and treatment of brain tumors.¹ Accurately delineating tumor regions in medical images, particularly in neuroimaging, is essential for effective diagnosis, treatment planning, and patient monitoring.² The task involves segmenting brain tumors in images such as magnetic resonance imaging (MRI) and multimodal MRI scans, enabling clinicians to pinpoint regions of interest and make informed decisions on intervention strategies.³ Recent advances in deep learning (DL) have shown strong potential for extracting clinically relevant biomarkers and gene-level signatures from medical imaging data, further highlighting the importance of accurate automated segmentation in oncology and neuro-oncology.^4,5 Brain tumors can grow undetected to substantial sizes and exhibit varying shapes and sizes, making segmentation a challenging yet vital task.⁶ Tumors are clusters of abnormal cells that proliferate within the brain and, if left untreated, can be life-threatening. These tumors are categorized into malignant (cancerous) and benign (non-cancerous) forms, with gliomas being the most common type of malignant brain tumor among adults.⁷ Gliomas are further classified into low-grade gliomas, which are slow-growing, and high-grade gliomas, which are aggressive and fast-growing.⁸ The segmentation of glioma regions often involves identifying three key subregions: the tumor core (TC), the enhancing core (EC), and the whole tumor (WT).⁹ Gliomas account for a significant proportion of brain cancer cases diagnosed worldwide.⁹ Approximately 17,200 people die each year from malignant brain tumors, while more than 90,000 patients receive an initial brain tumor diagnosis. Epidemiological analyses indicate that 25 individuals out of every 100,000 are affected by brain tumors, with approximately 33% classified as malignant.¹⁰

Medical imaging, particularly in neurology, places strong emphasis on brain segmentation due to its critical role in diagnosing and managing neurological disorders, including brain tumors.¹¹ Accurate segmentation supports diagnostic and therapeutic strategies, surgical planning, disease progression tracking, and personalized treatment development.¹² However, manual segmentation remains labor-intensive, time-consuming, and prone to human error, limiting its scalability and precision, especially for large datasets.¹³ Traditional segmentation tools, while useful, often fail to capture complex tumor boundary characteristics, resulting in inconsistent outcomes across patients and tumor types.

In contrast, DL-based approaches, particularly those based on the U-Net architecture, have transformed BraTS by automating segmentation and achieving high accuracy through learning from large-scale datasets.¹⁴ U-Net and its variants utilize convolutional neural networks (CNNs) to extract complex patterns from MRI images, significantly improving tumor boundary delineation accuracy.¹⁵ Prior studies demonstrated that architectures such as Deep U-Net and diffusion-based models outperformed manual techniques and enhanced clinical decision-making.^16,17 Despite these advances, existing methods continued to face limitations. Tabassum et al.¹⁸ showed that advanced meta-transfer learning strategies improved U-Net performance for tumor segmentation, particularly for underrepresented tumor types such as meningioma and metastasis. Saeed et al.¹⁹ proposed RMU-Net, which achieved high accuracy on the BraTS 2018 dataset but struggled with heterogeneous tumor subregions. Similarly, Ali et al.²⁰ introduced an ensemble of U-Net and three-dimensional (3D) CNN models, although the approach was constrained by high-computational complexity, limiting clinical applicability. Tataei et al.²¹ employed CNN engineering with ResNet-50, yet the fixed convolutional layers did not fully capture intricate tumor boundary variations. Moreover, commonly used evaluation metrics such as the Dice similarity coefficient (DSC) were not always sufficient to capture segmentation quality comprehensively, motivating the need for more robust assessment methodologies.

Recent developments in DL addressed these challenges through the incorporation of attention mechanisms into segmentation models. Attention-based architectures enabled networks to selectively focus on salient image features, improving segmentation accuracy by prioritizing relevant spatial information. These mechanisms also facilitated improved detection of complex tumor structures while reducing computational overhead, supporting practical clinical deployment.

While previous studies have provided extensive background on brain tumor segmentation, the key novelty of this study lies in introducing a streamlined self-attention mechanism integrated directly into the U-Net architecture. Unlike existing attention-based U-Net models, self-attention U-Net (SAU-Net) was designed to enhance feature selectivity while minimizing computational complexity, thereby enabling more accurate delineation of heterogeneous tumor boundaries.

To overcome the limitations of existing segmentation techniques, this study proposed SAU-Net, a novel architecture that incorporated self-attention mechanisms into the U-Net framework. By selectively emphasizing diagnostically relevant tumor features while preserving spatial coherence, SAU-Net achieved more precise segmentation of diverse brain tumor subregions. Experimental evaluations on the BraTS 2018 and BraTS 2020 datasets demonstrated that the proposed model outperformed state-of-the-art (SOA) methods in terms of DSC, sensitivity, and specificity. Furthermore, SAU-Net exhibited reduced memory consumption and computational complexity, making it well-suited for practical clinical applications.

Contributions

This paper makes the following significant contributions:

Novel architecture for the model: We proposed a novel SAU-Net specifically designed for highly precise BraTS from multimodal MRI data. By integrating self-attention mechanisms into the U-Net architecture, the proposed model improved segmentation accuracy and computational efficiency, enabling precise delineation of complex tumor boundaries and subregions, including the WT, TC, and ET.

Enhanced feature representation: The attention mechanisms incorporated into SAU-Net were designed to selectively emphasize diagnostically relevant features in MRI images while preserving essential spatial context. This design resulted in improved segmentation of heterogeneous tumor types, particularly in challenging regions where conventional models often underperformed.

Demonstrated SOA performance: The proposed model was rigorously evaluated on benchmark datasets and demonstrated SOA segmentation performance. SAU-Net achieved superior accuracy compared to existing BraTS methods, as evidenced by improvements in DSC, sensitivity, and specificity.

Improved computational efficiency: The model addressed the computational demands of medical image segmentation by incorporating lightweight attentional strategies that effectively focused on salient features. This design reduced memory consumption and processing overhead, making SAU-Net a viable solution for real-time clinical applications.

Research questions

What architectural advancements in SAU-Net enable higher segmentation accuracy for brain tumors compared to traditional U-Net and modern DL techniques?

How does SAU-Net’s performance in segmenting WT, TC, and ET from MRI data compare with SOA models on heterogeneous tumor datasets?

How does the integration of self-attention mechanisms contribute to reduced computational complexity while maintaining high segmentation accuracy in clinical applications?

Hypothesis

This study hypothesized that the proposed SAU-Net would outperform existing SOA BraTS techniques, including conventional U-Net models. It was anticipated that integrating self-attention mechanisms into a convolutional U-Net framework would enhance the discrimination of complex tumor subregions, such as the ET, TC, and WT. The model was expected to achieve improved performance across key evaluation metrics, including DSC, sensitivity, and specificity. Additionally, the attention mechanism was hypothesized to reduce computational complexity, thereby enabling efficient and accurate clinical deployment.

This paper is organized as follows: Section “Related works” reviews the relevant literature. Section “Methodology” details the proposed methodology. Section “Performance analysis” presents and discusses the experimental results along with the performance analysis. Finally, Section “Discussion” compares the proposed approach with existing works, and Section “Conclusion” summarizes the study and outlines the key findings.

Related works

The identification and treatment planning of brain tumors depend heavily on accurate segmentation using multimodal MRI data. Numerous DL frameworks have been developed to address this challenging problem by leveraging the rich information contained in MRI images. This section reviews recent advancements in BraTS, with particular emphasis on studies conducted using the BraTS 2018 and BraTS 2020 datasets, highlighting key methodologies and performance outcomes.

Related works on BraTS 2018

Saeed et al.¹⁹ proposed a hybrid model, RMU-Net, which combined MobileNetV2 and U-Net architectures for end-to-end brain tumor segmentation. Using the BraTS 2018 dataset, the model achieved Dice coefficients of 90.80, 86.75, and 79.36 for different tumor subregions. Ali et al.²⁰ presented an ensemble approach integrating U-Net and 3D CNN models, which yielded Dice scores of 0.750, 0.906, and 0.846 for the ET, WT, and TC, respectively, on the BraTS 2018 test set. Tataei et al.²¹ employed CNN-based feature engineering with a ResNet-50 backbone and obtained competitive Dice scores across multiple tumor regions using the BraTS 2018 dataset. Gull et al.²² developed a CNN-based framework for both tumor segmentation and classification, achieving average accuracies of 96.50% for segmentation and 96.49% for classification.

Ullah et al.²³ demonstrated the effectiveness of a 3D U-Net model for automatic tumor segmentation and MRI image enhancement. Their method produced mean Dice scores of 0.70 for the ET, 0.86 for the TC, and 0.91 for the WT, while testing yielded Dice scores of 0.83 for the ET, 0.90 for the TC, and 0.71 for the WT. MBANet, a multi-branch attention-based 3D CNN, was also evaluated on the BraTS 2018 dataset and achieved Dice scores of 78.21, 89.79, and 83.04 for the ET, WT, and TC, respectively.

Zia et al.²⁴ proposed a context-aware attentional residual dropout U-Net (ARDUNet), which achieved Dice scores of approximately 0.90 across tumor subregions on the BraTS 2018 dataset, including 0.92 for the ET. Sun et al.²⁵ developed a three-dimensional fully convolutional network that obtained DSCs of 0.90, 0.79, and 0.77 for the WT, TC, and ET segmentation, respectively. Pedada et al.⁹ introduced a residual U-Net architecture and reported a segmentation accuracy of 92.20% on the BraTS 2018 challenge dataset, with enhanced classification performance for the TC, EC, and WT regions.

Related works on BraTS 2020

Isensee et al.²⁶ utilized the nnU-Net framework for the BraTS 2020 challenge and demonstrated that the baseline configuration achieved strong segmentation performance without manual architectural tuning. Through extensive pipeline optimization, including aggressive data augmentation, region-based training, and post-processing strategies, their approach won the BraTS 2020 challenge, achieving Dice scores of 82.03% for the ET, 85.06% for the TC, and 88.95% for the WT.

Kataria et al.²⁷ introduced HybriCSF, a hybrid segmentation framework combining CNNs, support vector machines, and fuzzy C-means clustering. Evaluation on the BraTS 2020 dataset showed improved segmentation performance, with Dice scores of 0.63 for the ET, 0.87 for the WT, and 0.81 for the TC. Magadza et al.²⁸ extended the nnU-Net architecture by replacing standard convolutional blocks with depthwise-separable bottleneck units and incorporating shuffle attention in skip connections. Their optimized network achieved Dice scores of 84.8% for the TC, 91.2% for the WT, and 79.2% for the ET.

Susanto et al.²⁹ proposed a spatial transformation-based data augmentation pipeline that improved segmentation accuracy on the BraTS 2020 dataset, yielding Dice scores of 87.10 for the ET, 86.89 for the TC, and 90.91 for the WT. Zhang et al.³⁰ developed a multi-encoder 3D MRI segmentation framework and introduced a categorical Dice loss function to address voxel imbalance. Their method achieved Dice scores of 70.24% for the WT, 88.26% for the TC, and 73.86% for the ET in the BraTS 2020 challenge.

Angona et al.³¹ proposed a hybrid 3D ResAttU-Net-Swin model integrating residual learning, attention-based skip connections, and a Swin Transformer. When evaluated on the BraTS 2020 dataset, the model achieved Dice scores of 92.50% for the WT, 89.70% for the TC, and 82.60% for the ET, resulting in an average DSC of 88.27%. Li et al.³² introduced DPF-Unet, a dual-path 3D segmentation framework that combined CNN-based local feature extraction with transformer-based global modeling. Their approach achieved Dice scores of 91.74% for the WT, 89.44% for the TC, and 84.88% for the ET on the BraTS 2020 dataset.

Overall, these studies demonstrated the steady progress of DL-based methods, particularly U-Net-derived architectures, in improving brain tumor segmentation accuracy and robustness across benchmark datasets.

Methodology

During this investigation, we developed a new SAU-Net for BraTS using MRI images. The proposed methodology followed a clear and sequential process. First, the MRI image data were collected and preprocessed. Next, the SAU-Net architecture was constructed as a modified U-Net incorporating self-attention mechanisms. Finally, the model was trained on the prepared datasets and subsequently used to predict and segment brain tumors in unseen MRI images. The detailed architecture of the proposed SAU-Net is illustrated in Figure 1.

Figure 1.

Overview of the proposed SAU-Net architecture for brain tumor segmentation using MRI images. The pipeline consists of three main components: (1) Image preprocessing, including image resizing, wavelet denoising, volume slicing, image scaling, and one-hot encoding; (2) the SAU-Net encoder–decoder architecture, where Blocks 1–4 form the encoder, Block 5 integrates a lightweight self-attention mechanism at the bottleneck, and Blocks 6–9 form the decoder with skip connections; and (3) the segmentation output for WT, TC, and ET. SAU-Net: self-attention U-Net; MRI: magnetic resonance imaging; WT: whole tumor; TC: tumor core; ET: enhancing tumor.

Brain multimodal MRI data collection

The BraTS 2018 and 2020 challenge datasets served as the primary benchmark datasets used in this study. All attention-based models were developed, trained, and evaluated using these datasets.

The BraTS 2018 dataset³³ is a comprehensive resource for segmenting both high-grade and low-grade gliomas. It includes multi-institutional MRI scans with expert-annotated tumor regions. Each patient scan contained four MRI modalities: T1-weighted (T1), T1-weighted with contrast enhancement (T1CE), T2-weighted (T2), and fluid-attenuated inversion recovery (FLAIR). These modalities enabled the identification of the ET, TC, and WT, including edema. The dataset comprised 285 samples, distributed into 205 training cases, 43 validation cases, and 37 testing cases. All images were preprocessed to maintain a consistent spatial resolution, and skull stripping was applied to isolate brain tissue.

The BraTS 2020 dataset³⁴ extended previous releases by providing a larger and more diverse collection of MRI scans. It contained 368 samples, partitioned into 265 training cases, 65 validation cases, and 47 testing cases. Similar to BraTS 2018, each subject included four MRI modalities (T1, T1CE, T2, and FLAIR), enabling accurate segmentation of glioma subregions. BraTS 2020 further incorporated improved annotation quality and a broader range of clinical cases while preserving its focus on glioma subregion segmentation.

Table 1 summarizes the training, validation, and testing splits for both datasets. Representative sample images from the two datasets, illustrating the different imaging modalities used for BraTS, are presented in Figure 2. Specifically, Figure 2(a) shows samples from BraTS 2018, whereas Figure 2(b) shows samples from BraTS 2020.

Figure 2.

Brain tumor segmentation (BraTS) 2018 and BraTS 2020 dataset sample images. (a) BraTS 2018 and (b) BraTS 2020.

Table 1.

Dataset distribution for brain tumor segmentation (BraTS) 2018 and BraTS 2020 datasets.

Dataset	Train	Validation	Test	Total
BraTS 2018	205	43	37	285
BraTS 2020	265	65	47	368

Preprocessing

A comprehensive preprocessing workflow was employed to maximize data quality and model performance in preparing the BraTS MRI images for tumor segmentation. This systematic procedure improved image uniformity and enhanced segmentation accuracy.

The preprocessing pipeline consisted of the following key steps:

Image resizing: All input images were resized to a fixed resolution of $128 \times 128$ pixels to ensure processing consistency. This uniformity facilitated efficient batch processing and analysis by maintaining consistent input dimensions for the model. Additionally, this step reduced computational overhead and accelerated the training process.

Wavelet denoising: Wavelet denoising was applied to suppress noise in the MRI images while preserving essential structural information. This technique decomposed images into multiple frequency components, enabling targeted noise reduction without significant loss of diagnostically relevant details. By improving image quality, this step enhanced the model’s ability to accurately identify and segment tumor regions.

Volume slicing: Since MRI data are three-dimensional, an adaptive volume slicing strategy was employed. Rather than relying on a fixed, manually defined slice range, the method uniformly extracted a predefined number of slices distributed across the volume. This approach ensured comprehensive representation of anatomical structures while avoiding the computational burden associated with extensive pre-slice experimentation.

Image scaling: The intensity values of the MRI images were normalized to a fixed range of 0 to 1. This normalization improved feature comparability and facilitated stable model convergence during training. Moreover, scaling mitigated the impact of intensity variations arising from differences in imaging devices or acquisition protocols.

One-hot encoding: One-hot encoding was employed to transform categorical labels representing different tumor subregions or tissue types into binary vectors. This representation provided precise segmentation masks and facilitated effective learning of complex relationships between image features and their corresponding labels within the attention-based model.

By implementing this preprocessing pipeline, a consistent and reliable dataset was established for brain tumor segmentation, thereby contributing to improved robustness and accuracy of the segmentation outcomes.

Volume slicing

To prepare the three-dimensional MRI volumes for analysis, an adaptive slicing strategy was employed to extract representative cross-sectional slices. Unlike conventional methods that relied on a fixed slice range, this approach dynamically and uniformly selected a predefined number of slices from each volume, irrespective of its depth. This ensured comprehensive coverage of anatomical structures and enabled the segmentation model to consistently capture salient spatial patterns throughout the brain.

The adaptive slicing strategy also eliminated the need for computationally intensive experimentation with multiple slice resolutions and orientations, which often varied across datasets. By ensuring that critical anatomical regions were consistently included, the method enhanced segmentation performance while substantially reducing data preparation time and computational effort.

Furthermore, the adaptive approach prioritized slices intersecting regions with higher tumor probability, increasing the likelihood that each selected slice contained clinically meaningful pathological information. Consequently, the segmentation model was consistently exposed to tumor-related features across varying brain depths, improving reproducibility and reducing the risk of omitting critical tumor regions due to arbitrary slice selection.

Attention-based U-Net architecture

An attention-based U-Net architecture was employed in this study to efficiently identify and segment tumors in MRI images. This design incorporated several essential components, each contributing to improved segmentation performance. The proposed model was informed by a range of attention-based U-Net variants, including adaptive attention U-Net (ADAU-Net), SAU-Net, multi-head attention U-Net (MHAU-Net), and group query attention U-Net (GQAU-Net), each developed to address specific challenges in medical image segmentation. The key mechanisms underlying these attention models are described below:

Self-attention mechanism (SAU-Net): The self-attention mechanism in SAU-Net enabled the model to suppress noisy or irrelevant regions of the input while emphasizing diagnostically relevant areas. This mechanism adaptively weighted feature representations according to their importance, thereby improving segmentation accuracy, particularly along tumor boundaries. By capturing long-range dependencies between spatial locations, the model leveraged global contextual information, which was essential for handling complex tumor shapes and size variations.

Multi-head attention (MHAU-Net): The MHAU-Net employed multiple parallel attention heads to enhance the conventional self-attention process. Each head was trained to focus on different spatial regions or feature characteristics within the MRI data, allowing the model to capture diverse spatial and contextual patterns simultaneously. This architecture modeled both fine-grained details and broader anatomical structures, resulting in improved segmentation performance and enhanced computational robustness compared to single-head attention mechanisms.

Adaptive attention (ADAU-Net): In ADAU-Net, an adaptive attention mechanism dynamically adjusted attention weights during segmentation. Unlike static attention schemes, this approach allowed the model to shift its focus based on input characteristics such as tumor intensity, texture, and shape. This adaptability yielded more accurate segmentation outcomes, particularly for highly heterogeneous tumors and datasets with varying image quality.

Group query attention (GQAU-Net): The GQAU-Net introduced a mechanism in which multiple query groups were generated to extract information from different image regions. This strategy was particularly effective for segmenting complex tumor subregions, including the WT, TC, and ET. By attending to multiple tumor aspects concurrently, the model improved segmentation accuracy in challenging cases involving overlapping or intertwined regions.

U-Net architecture: The underlying U-Net architecture followed a symmetric encoder–decoder design augmented with skip connections. The encoder progressively compressed input images into hierarchical feature representations, while the decoder reconstructed high-resolution segmentation maps by combining fine-grained spatial information from the encoder with high-level semantic features through skip connections. This multi-scale fusion preserved critical structural information and enabled precise localization and delineation of tumor boundaries and subregions.

In the proposed architecture, a self-attention module was incorporated to enhance feature representation and capture long-range spatial correlations within the input data. This design choice was motivated by the need to improve discriminative capability by selectively emphasizing diagnostically relevant features while suppressing noise and irrelevant anatomical structures. Unlike conventional attention mechanisms that relied on multi-head formulations and introduced additional computational complexity, the implemented self-attention module employed three linear transformations—query, key, and value—to compute attention scores directly within the feature space. This formulation enabled efficient computation of attention weights and dynamic feature reweighting without requiring multiple attention heads.

Furthermore, the proposed approach preserved the spatial integrity of feature maps, allowing the attention mechanism to adaptively enhance segmentation performance by focusing on the most informative image regions. Empirical evaluation demonstrated improved identification of critical tumor areas, particularly in challenging brain tumor segmentation scenarios. Overall, integrating self-attention within the U-Net framework facilitated a more nuanced understanding of spatial dependencies in the data, which was essential for achieving precise and reliable tumor segmentation.

Mechanisms of attention enhancement

The proposed SAU-Net architecture incorporated a streamlined self-attention mechanism designed to enhance feature representation and emphasize diagnostically relevant spatial regions in the input data. To clarify the integration of this module within the U-Net framework, a concise description is provided below, followed by Algorithm 1, which outlines the forward pass of SAU-Net and illustrates how the attention block operates at the bottleneck stage before being propagated through the decoder via skip connections.

The self-attention module computed attention scores by applying linear transformations to the input feature maps to generate query ( $Q$ ), key ( $K$ ), and value ( $V$ ) representations. These components enabled the model to dynamically calculate region-specific attention weights, thereby prioritizing the most relevant tumor regions for accurate segmentation. By adaptively focusing on diagnostically meaningful features while suppressing background noise, the model improved its ability to delineate tumor boundaries, even in challenging scenarios involving subtle or overlapping anatomical structures.

The computed attention scores facilitated long-range dependency modeling and contextual feature aggregation. When reintegrated into the U-Net architecture, the attention-weighted features enhanced the decoder’s capacity to reconstruct accurate segmentation maps. Unlike multi-head attention mechanisms, which often introduce additional architectural and computational complexity, SAU-Net employed a lightweight single-head self-attention formulation. This design preserved computational efficiency while maintaining strong representational capability, making the model well-suited for medical imaging applications such as brain tumor segmentation.

Overall, the incorporation of the self-attention mechanism significantly contributed to the segmentation performance of SAU-Net by enabling the network to adaptively focus on the most informative regions of the input data, resulting in improved accuracy and robustness.

Algorithm 1

Forward pass of SAU-Net (Blocks 1--9) with integrated self-attention.

Require: Preprocessed MRI slice $x$
Ensure: Predicted segmentation mask $\hat{y}$
1:	▷ Encoder pathway (Blocks 1–4)
2: $E_{1} \leftarrow {Block}_{1} (x)$
3: $E_{2} \leftarrow {Block}_{2} (E_{1})$
4: $E_{3} \leftarrow {Block}_{3} (E_{2})$
5: $E_{4} \leftarrow {Block}_{4} (E_{3})$
6:	▷ Bottleneck with self-attention (Block 5)
7: $B \leftarrow {ConvBlock}_{5} (E_{4})$
8: $Q \leftarrow W_{q} B$
9: $K \leftarrow W_{k} B$
10: $V \leftarrow W_{v} B$
11: $α \leftarrow Softmax (\frac{Q K^{⊤}}{\sqrt{d_{k}}})$
12: $B_{sa} \leftarrow α V$
13:	▷ Decoder with skip connections (Blocks 6–9)
14: $D_{6} \leftarrow {Block}_{6} (B_{sa}, E_{4})$
15: $D_{7} \leftarrow {Block}_{7} (D_{6}, E_{3})$
16: $D_{8} \leftarrow {Block}_{8} (D_{7}, E_{2})$
17: $D_{9} \leftarrow {Block}_{9} (D_{8}, E_{1})$
18:	▷ Output layer
19: $\hat{y} \leftarrow Softmax ({Conv}_{1 \times 1} (D_{9}))$

Brain tumor segmentation (BraTS)

In this study, SAU-Net was employed to segment MRI brain images into three clinically significant tumor subregions. This segmentation strategy provided a comprehensive characterization of tumor pathology and morphology, which was essential for accurate clinical analysis and treatment planning. The three segmented subregions included the WT, the TC, and the ET.

The WT encompassed the entire tumor region, including surrounding edema, thereby offering a complete representation of overall tumor burden. The TC represented the central, actively growing component of the tumor and was critical for assessing tumor aggressiveness. The ET highlighted regions exhibiting contrast enhancement, which were essential for distinguishing viable tumor tissue from non-tumorous regions.

By providing a detailed delineation of these tumor subregions, the proposed segmentation approach enabled more precise treatment planning and facilitated effective monitoring of tumor progression and therapeutic response.

Results analysis

In this study, we employed a range of attention-based U-Net architectures to address the task of brain tumor segmentation. The proposed SAU-Net was rigorously evaluated on the BraTS 2018 and BraTS 2020 datasets, and its performance was systematically compared with existing approaches, demonstrating notable advantages in segmentation accuracy and robustness.

Evaluation setup

The experimental setup was designed to ensure robust and reproducible evaluation. All experiments were conducted in a high-performance computing environment equipped with an NVIDIA Tesla P100 GPU (30 GB memory), 30 GB of system RAM, and 70 GB of dedicated disk space. For model implementation and analysis, widely adopted frameworks and libraries were utilized, including TensorFlow, Keras, Matplotlib, Pandas, and NumPy. The key performance metrics used to assess the segmentation capability of attention U-Net variants are summarized in Table 2.

Table 2.

Performance metrics and formulas.

Metric	Formula
Dice (Dice similarity coefficient)	$Dice = \frac{2 \times ‖ P \cap G ‖}{‖ P + G ‖}$
IoU (Jaccard index)	$Jaccard = \frac{‖ P \cap G ‖}{‖ P \cup G ‖}$
Sensitivity (TP rate)	$Sensitivity = \frac{TP}{TP + FN}$
Specificity (TN rate)	$Specificity = \frac{TN}{TN + FP}$
Youden index (J statistic)	$J = Sensitivity + Specificity - 1$

IoU: intersection of union; TP: true positive; TN: true negative; FN: false negative; FP: false positive.

Performance analysis

The performance of the proposed SAU-Net model was evaluated on the BraTS 2018 and BraTS 2020 datasets through a comprehensive segmentation analysis. These benchmark datasets enabled a systematic assessment of the model’s ability to distinguish different tumor subregions under diverse imaging conditions. Segmentation performance was quantified using standard evaluation metrics, including the Dice coefficient, Jaccard index, sensitivity, and specificity.

Table 3 shows that SAU-Net consistently outperformed the comparative models across all tumor classes. On the BraTS 2018 dataset, SAU-Net achieved a Dice coefficient of 98.35% for the ET, exceeding the performance of ADAU-Net (98.30%) and MHAU-Net (98.23%). The model also demonstrated superior accuracy for the WT and TC categories, with Dice coefficients of 98.16% and 98.87%, respectively. This improved performance was attributed to the effective integration of self-attention mechanisms, which enabled the model to emphasize diagnostically relevant features within MRI images. Such feature selectivity was particularly important in medical imaging, where subtle variations in tumor characteristics can substantially influence segmentation accuracy.

Table 3.

Performance analysis of BraTS on MRI image.

Dataset	Model	Class	Dice	Jaccard	Sensitivity	Specificity	IoU
BraTS 2018	ADAU-Net	ET	98.30	96.75	98.30	99.43	96.75
		WT	97.91	96.01	97.91	99.30	96.01
		TC	98.71	97.50	98.71	99.57	97.50
	SAU-Net	ET	98.35	96.84	98.35	99.45	96.84
		WT	98.16	96.48	98.16	99.39	96.48
		TC	98.87	97.80	98.87	99.62	97.80
	MHAU-Net	ET	98.23	96.61	98.23	99.41	96.61
		WT	97.42	95.10	97.42	99.14	95.10
		TC	98.36	96.83	98.36	99.45	96.83
	GQAU-Net	ET	98.13	96.43	98.13	99.38	96.43
		WT	97.75	95.72	97.75	99.25	95.72
		TC	98.53	97.16	98.53	99.51	97.16
BraTS 2020	ADAU-Net	ET	98.81	97.69	98.80	99.60	97.69
		WT	98.35	96.83	98.35	99.45	96.83
		TC	99.02	98.09	99.02	99.67	98.09
	SAU-Net	ET	98.99	98.04	98.99	99.66	98.04
		WT	98.70	97.49	98.70	99.57	97.49
		TC	99.18	98.40	99.18	99.73	98.40
	MHAU-Net	ET	98.61	97.31	98.61	99.54	97.31
		WT	97.92	95.99	97.92	99.31	95.99
		TC	98.93	97.92	98.93	99.64	97.91
	GQAU-Net	ET	98.00	96.15	98.00	99.33	96.15
		WT	97.31	94.88	97.31	99.10	94.88
		TC	98.16	96.45	98.16	99.39	96.45

The bold values highlight the highest performance scores and improve readability for comparison purposes.

BraTS: brain tumor segmentation; MRI: magnetic resonance imaging; IoU: intersection of union; ADAU-Net: adaptive attention U-Net; ET: enhancing tumor; WT: whole tumor; TC: tumor core; SAU-Net: self-attention U-Net; MHAU-Net: multi-head attention U-Net; GQAU-Net: group query attention U-Net.

On the BraTS 2020 dataset, SAU-Net maintained strong performance, achieving Dice coefficients of 98.99% for the ET and 99.18% for the TC. These results demonstrated the robustness of the model and its ability to generalize across datasets with varying levels of complexity. In addition, the sensitivity and specificity metrics further confirmed the reliability of the proposed approach. Specifically, SAU-Net achieved sensitivity values of 98.99% for the ET, 98.70% for the WT, and 99.18% for the TC, along with corresponding specificity values of 99.66%, 99.57%, and 99.73%, respectively. Overall, these findings indicated that SAU-Net provided accurate and consistent tumor segmentation performance, supporting its suitability for automated brain tumor analysis.

The analysis of average performance metrics provided a concise and comparative perspective on the effectiveness of the proposed SAU-Net model relative to other SOA BraTS approaches. By averaging performance across the BraTS 2018 and BraTS 2020 datasets, this evaluation enabled a clearer understanding of the overall strengths and limitations of the SAU-Net architecture. The analysis also emphasized the role of individual performance metrics in assessing model reliability and practical applicability.

As summarized in Table 4, SAU-Net achieved the highest average Dice coefficients on both the BraTS 2018 and BraTS 2020 datasets, with values of 98.23% and 98.62%, respectively. These results indicated a strong capability to accurately delineate tumor boundaries. The Dice coefficient, which quantifies the overlap between ground truth annotations and predicted segmentations, remains a critical metric in medical image analysis. The high Dice scores demonstrated that SAU-Net consistently identified tumor regions with high precision, thereby reducing the likelihood of missing clinically significant areas during segmentation. In addition, SAU-Net attained Jaccard index values of 96.63% for BraTS 2018 and 97.34% for BraTS 2020, reflecting strong agreement with the ground truth annotations.

Table 4.

Average performance analysis of BraTS on MRI image.

Model	Metric	BraTS 2018	BraTS 2020
ADAU-Net	Dice coefficient	97.92	98.44
	Jaccard score	96.04	97.00
	Sensitivity	97.92	98.44
	Specificity	99.31	99.48
	IoU	96.04	97.00
	J Statistic	96.23	96.92
SAU-Net	Dice coefficient	98.23	98.62
	Jaccard score	96.63	97.34
	Sensitivity	98.23	98.62
	Specificity	99.41	99.54
	IoU	96.63	97.34
	J Statistic	96.64	97.16
MHAU-Net	Dice coefficient	96.92	97.70
	Jaccard score	94.22	95.62
	Sensitivity	96.92	97.70
	Specificity	98.97	99.23
	IoU	94.22	95.62
	J Statistic	94.89	95.93
GQAU-Net	Dice coefficient	98.01	97.40
	Jaccard score	96.21	95.05
	Sensitivity	98.01	97.40
	Specificity	99.34	99.13
	IoU	96.21	95.05
	J Statistic	96.35	95.53

The bold values highlight the highest performance scores and improve readability for comparison purposes.

BraTS: brain tumor segmentation; MRI: magnetic resonance imaging; ADAU-Net: adaptive attention U-Net; IoU: intersection of union; SAU-Net: self-attention U-Net; MHAU-Net: multi-head attention U-Net; GQAU-Net: group query attention U-Net.

Furthermore, sensitivity and specificity metrics reinforced the robustness of the proposed model. SAU-Net recorded average sensitivity values of 98.23% on BraTS 2018 and 98.62% on BraTS 2020, indicating effective detection of true positive tumor cases. The corresponding average specificity values were 99.41% and 99.54%, respectively, demonstrating the model’s ability to accurately identify non-tumorous regions and minimize false positive predictions. Such balanced performance is particularly important in clinical contexts, where misclassification may have significant consequences.

In comparison, although alternative models such as MHAU-Net and GQAU-Net demonstrated competitive performance, their evaluation metrics remained consistently lower than those achieved by SAU-Net. For instance, MHAU-Net reported Dice coefficients of 96.92% for BraTS 2018 and 97.70% for BraTS 2020, whereas GQAU-Net achieved values of 98.01% and 97.40%, respectively. Overall, this comparative analysis confirmed that SAU-Net outperformed competing methods across key evaluation metrics, highlighting its strong potential for reliable deployment in automated brain tumor segmentation tasks.

The tradeoff between sensitivity (true positive rate) and specificity (true negative rate) was examined using the Youden index on BraTS datasets. The values calculated for the different models, as summarized in Table 4, demonstrated each model’s ability to balance false positive and false negative rates while accurately identifying tumor regions. These results supported the statistical significance of the findings and indicated that the proposed SAU-Net model achieved superior segmentation effectiveness compared to the other evaluated approaches. This analysis enabled a quantitative comparison of the models within a clinically relevant framework and underscored the importance of the Youden index as a robust statistical measure in performance evaluation.

SAU-Net incorporated self-attention mechanisms that allowed the model to emphasize diagnostically relevant regions of the input images, thereby capturing fine-grained details of irregular structures and poorly defined tumor boundaries. This capability enhanced sensitivity to subtle variations in tumor morphology and facilitated more accurate segmentation of heterogeneous tumor subregions, including WT, TC, and ET. In addition, extensive experiments were conducted to validate the performance of SAU-Net on multiple benchmark datasets, including BraTS 2018 and BraTS 2020, as reported in Tables 3 and 4. The results demonstrated the robustness of SAU-Net in addressing challenging segmentation scenarios and confirmed its superior performance in terms of Dice coefficient, sensitivity, and specificity relative to existing models.

Performance visualization

Figure 3 illustrates the effectiveness of the proposed SAU-Net model in segmenting brain tumors from MRI images. The close correspondence between the ground truth annotations and the predicted segmentation regions demonstrated the robustness of the proposed approach.

Figure 3.

The prediction of BraTS using SAU-Net model. BraTS: brain tumor segmentation; SAU-Net: self-attention U-Net.

The observed performance improvements were attributed to the architectural design of SAU-Net, which incorporated enhanced contextual awareness through the integration of attention mechanisms and context aggregation units. This design enabled the model to capture complex spatial relationships and subtle feature variations that were critical for accurate tumor segmentation. Consequently, tumor boundaries were delineated more precisely, and small or irregular anomalies were detected more reliably, supporting accurate segmentation outcomes.

Benefits and limitations of SAU-Net

SAU-Net offers several practical advantages over traditional U-Net architectures. The lightweight attention block positioned at the bottleneck enables the network to capture broader contextual information while emphasizing clinically relevant tumor regions. This design leads to more precise segmentation of the WT, TC, and ET, while maintaining a relatively low model complexity and computational cost. In addition, SAU-Net retains the conventional encoder–decoder structure, which facilitates its integration into clinical workflows and allows straightforward fine-tuning for other MRI-based segmentation tasks.

Despite these advantages, the model also presents certain limitations. Because SAU-Net was trained and evaluated primarily on BraTS multimodal MRI datasets, its performance on other imaging modalities or data acquired from different institutions may vary unless appropriate domain adaptation techniques are applied. Although the single-head attention mechanism is computationally lighter than multi-head attention variants, it still introduces additional overhead compared to a standard U-Net, which may pose challenges in resource-constrained environments. Furthermore, the approach relies on high-quality manual annotations for supervised training, which are time-consuming and costly to obtain at a large scale.

Discussion

The comparative analysis presented in Table 5 showed that the proposed SAU-Net model achieved superior performance in BraTS, particularly on the BraTS 2018 dataset. The model consistently outperformed other SOA methods in segmenting the three primary tumor subregions—WT, TC, and ET—as measured by the Dice coefficient. Specifically, SAU-Net surpassed competing models such as HTTU-Net (91.50), achieving a Dice coefficient of 98.16 for the WT. This performance can be attributed to the integrated self-attention mechanisms, which enable the model to preserve critical spatial context while selectively emphasizing diagnostically relevant features. Such capability is essential for effectively handling the heterogeneous and complex nature of brain tumors.

Table 5.

Performance analysis (Dice coefficient) of our proposed SAU-Net model with state-of-the-art methods works.

Author	Model	Dataset	WT	TC	ET
Ahmad et al.³⁵	3D-UNet	BraTS 2018	91.17	84.10	77.00
Aboelenein et al.³⁶	HTTU-Net	BraTS 2018	91.50	92.30	88.70
Saeed et al.¹⁹	RMU-Net	BraTS 2018	90.80	86.75	79.36
Ali et al.²⁰	UNet-3D-CNN	BraTS 2018	90.60	84.60	75.00
Tataei et al.²¹	CNN	BraTS 2018	89.93	92.11	92.23
Gull et al.²²	CNN	BraTS 2018	91.20	88.34	81.84
Ullah et al.²³	3D-UNet	BraTS 2018	90.00	83.00	71.00
Cao et al.³⁷	3D-CNN	BraTS 2018	89.79	83.04	78.21
Zia et al.²⁴	3D-ARDUNet	BraTS 2018	90.00	90.00	92.00
Sun et al.²⁵	3D-FCN	BraTS 2018	90.00	79.00	77.00
Proposed	SAU-Net	BraTS 2018	98.16	98.87	98.23

SAU-Net: self-attention U-Net; 3D: three-dimensional; BraTS: brain tumor segmentation; ARDUNet: attentional residual dropout U-Net; CNN: convolutional neural network; WT: whole tumor; TC: tumor core; ET: enhancing tumor.

Similarly, SAU-Net achieved a Dice coefficient of 98.87 for the TC, outperforming existing approaches and underscoring its clinical relevance, as accurate delineation of the TC is crucial for treatment planning. The model also demonstrated strong performance in segmenting the ET, attaining a Dice coefficient of 98.23 and exceeding the results of methods such as 3D-ARDUNet (92.00). This ability to capture subtle variations in ET regions is particularly important for monitoring treatment response and disease progression.

Furthermore, the high segmentation accuracy was supported by the proposed preprocessing pipeline, which incorporated adaptive volume slicing and wavelet denoising to improve image quality and consistency. The carefully designed architecture, combining attention mechanisms with contextual feature aggregation, allows SAU-Net to effectively address challenges such as irregular tumor shapes and poorly defined boundaries, thereby contributing to its robust segmentation performance.

Table 6 presents a comparative sensitivity analysis of the proposed SAU-Net model on the BraTS 2018 dataset against other SOA segmentation approaches. Sensitivity, a critical metric for accurately identifying true positive tumor regions, highlighted notable performance differences among the evaluated models. The 3D-UNet achieved sensitivity values of 91.83% for the WT, 82.05% for the TC, and 84.36% for the ET, indicating relatively limited effectiveness, particularly in TC detection.

Table 6.

Performance analysis (sensitivity) of our proposed SAU-Net model with state-of-the-art methods works.

Author	Model	Dataset	WT	TC	ET
Ahmad et al.³⁵	3D-UNet	BraTS 2018	91.83	82.05	84.36
Proposed	SAU-Net	BraTS 2018	98.16	98.87	98.23

SAU-Net: self-attention U-Net; WT: whole tumor; TC: tumor core; ET: enhancing tumor; 3D: three-dimensional; BraTS: brain tumor segmentation.

HTTU-Net demonstrated improved sensitivity performance, achieving scores of 94.10% for the WT, 97.20% for the TC, and 94.30% for the ET, thereby showing enhanced capability in identifying TC regions. In contrast, the proposed SAU-Net model achieved substantially higher sensitivity values of 98.16% for the WT, 98.87% for the TC, and 98.23% for the ET, reflecting its strong ability to accurately detect all tumor subregions. This marked improvement can be attributed to the effective integration of self-attention mechanisms within SAU-Net, which enhances feature discrimination and contributes to more reliable BraTS performance. Consequently, the improved sensitivity supports more precise tumor delineation and has meaningful implications for clinical decision-making.

Table 7 presents a comparative performance analysis of the proposed SAU-Net model on the BraTS 2020 dataset against several SOA brain tumor segmentation approaches based on the Dice coefficient. The Dice coefficient, which quantifies the overlap between predicted tumor regions and ground truth annotations, serves as a critical metric for evaluating segmentation accuracy. The nnU-Net model proposed by Isensee et al.²⁶ achieved Dice coefficients of 88.95% for the WT, 85.06% for the TC, and 82.03% for the ET, indicating solid overall performance while revealing limitations in accurately segmenting ET regions.

Table 7.

Performance analysis (Dice coefficient) of our proposed SAU-Net model with SOA works.

Author	Model	Dataset	WT	TC	ET
Isensee et al.²⁶	nnU-Net	BraTS 2020	88.95	85.06	82.03
Kataria et al.²⁷	HybriCSF	BraTS 2020	87.00	81.00	63.00
Magadza et al.²⁸	nnU-Net	BraTS 2020	91.20	84.80	79.20
Susanto et al.²⁹	Spatial transformer	BraTS 2020	90.91	86.89	87.10
Zhang et al.³⁰	Multiple Encoders	BraTS 2020	70.24	88.26	73.86
Aboelenein et al.³⁶	HTTU-Net	BraTS 2018	94.10	97.20	94.30
Angona et al.³¹	3D ResAttU-Net-Swin	BraTS 2020	92.50	89.70	82.60
Li et al.³²	DPF-Unet	BraTS 2020	91.74	89.44	84.88
Proposed	SAU-Net	BraTS 2020	98.99	98.70	99.18

SAU-Net: self-attention U-Net; SOA: state-of-the-art; WT: whole tumor; TC: tumor core; ET: enhancing tumor; BraTS: brain tumor segmentation.

The HybriCSF model introduced by Kataria et al.²⁷ yielded comparatively lower Dice scores, achieving 87% for the WT and only 63% for the ET, highlighting challenges in ET segmentation. Magadza et al.²⁸ reported a Dice coefficient of 91.2% for the WT, demonstrating improvement over several earlier approaches, whereas the spatial transformer-based model proposed by Susanto et al.²⁹ achieved a Dice score of 90.91% for the WT, reflecting reliable segmentation capability.

In comparison, the 3D ResAttU-Net-Swin model proposed by Angona et al.³¹ achieved Dice coefficients of 92.50% for the WT, 89.70% for the TC, and 82.60% for the ET on the BraTS 2020 dataset, indicating competitive performance but comparatively lower accuracy in ET segmentation. Similarly, the DPF-Unet model introduced by Li et al. reported Dice scores of 91.74% for the WT, 89.44% for the TC, and 84.88% for the ET, reflecting strong overall segmentation performance. However, both approaches were consistently outperformed by the proposed SAU-Net across all tumor subregions.

The SAU-Net model achieved Dice coefficients of 98.99% for the WT, 98.70% for the TC, and 99.18% for the ET, surpassing all compared methods. These results demonstrate the effectiveness of the self-attention mechanisms integrated into the SAU-Net architecture and position it as a leading approach for brain tumor segmentation. Furthermore, the performance analysis indicates that SAU-Net provides robust and reliable tumor delineation, particularly in accurately identifying complex tumor regions, which is essential for supporting informed clinical decision-making and improving patient outcomes.

The proposed SAU-Net model demonstrated substantially improved performance in segmenting ETs compared to previous studies, as shown in Tables 5 and 7. Historically, ET segmentation has been particularly challenging due to the heterogeneous and irregular nature of enhancing tumor regions, which has often resulted in lower Dice coefficients for earlier models. In contrast, SAU-Net incorporated advanced attention mechanisms that enabled the model to more effectively focus on the complex characteristics of ET regions. This targeted attention enhanced segmentation accuracy by strengthening the model’s ability to distinguish tumor tissue from surrounding non-tumorous areas.

On the BraTS 2018 dataset, SAU-Net achieved a Dice coefficient of 98.23% for ET segmentation, significantly outperforming established methods such as HTTU-Net (88.70%) and RMU-Net (79.36%). Similarly, on the BraTS 2020 dataset, SAU-Net attained a Dice coefficient of 99.18% for the ET, surpassing previously reported SOA results, including nnU-Net (82.03%) and HybriCSF (63%). The superior ET segmentation performance of SAU-Net can be attributed to the effective integration of attention mechanisms and robust feature extraction within its architecture. This design enabled the model to adaptively learn and emphasize the most informative features for ET identification, resulting in consistently enhanced segmentation outcomes relative to contemporary approaches.

Complexity analysis

This section evaluated the computational efficiency of different BraTS segmentation models applied to MRI images, with a focus on build time, prediction time, and memory usage. The analysis included four attention-based U-Net variants—ADAU-Net, SAU-Net, MHAU-Net, and GQAU-Net—and was conducted using the BraTS 2018 and BraTS 2020 datasets.

Table 8 summarizes the computational complexity metrics for each model across both datasets. On the BraTS 2018 dataset, ADAU-Net exhibited the longest build time at 225.25 s, whereas SAU-Net achieved a more efficient build time of 147.74 s. MHAU-Net and GQAU-Net followed with build times of 157.62 s and 152.00 s, respectively. Regarding prediction time, SAU-Net outperformed all other models, requiring only 54.85 s, while ADAU-Net recorded the longest prediction time at 70.20 s. Memory usage remained relatively consistent across models, with ADAU-Net consuming 11,253.04 MB and SAU-Net requiring slightly less at 11,145.25 MB. In contrast, MHAU-Net and GQAU-Net exhibited higher memory consumption, using 11,986.01 MB and 12,606.97 MB, respectively.

Table 8.

Complexity analysis of BraTS model on MRI image.

Experiment	Dataset	Model	Build time (s)	Prediction time (s)	Memory usage (MB)
1	BraTS 2018	ADAU-Net	225.25	70.2	11253.04
		SAU-Net	147.74	54.85	11145.25
		MHAU-Net	157.62	55.38	11986.01
		GQAU-Net	152.00	55.39	12606.97
2	BraTS 2020	ADAU-Net	315.98	89.92	15628.25
		SAU-Net	313.87	91.39	14961.56
		MHAU-Net	325.60	90.64	16151.36
		GQAU-Net	331.42	88.67	16461.01

BraTS: brain tumor segmentation; MRI: magnetic resonance imaging; ADAU-Net: adaptive attention U-Net; SAU-Net: self-attention U-Net; MHAU-Net: multi-head attention U-Net; GQAU-Net: group query attention U-Net.

A similar trend was observed on the BraTS 2020 dataset. ADAU-Net again recorded the longest build time at 315.98 s, while SAU-Net maintained competitive efficiency with a build time of 313.87 s. MHAU-Net and GQAU-Net required 325.60 s and 331.42 s, respectively. Prediction times increased for all models due to higher dataset complexity; SAU-Net required 91.39 s, whereas ADAU-Net required 89.92 s.

The memory usage analysis further revealed notable differences among the evaluated architectures. For BraTS 2018, SAU-Net demonstrated the lowest memory usage at 11,145.25 MB, followed closely by ADAU-Net at 11,253.04 MB. In contrast, MHAU-Net and GQAU-Net required substantially higher memory. For BraTS 2020, memory consumption increased across all models, with ADAU-Net using 15,628.25 MB and SAU-Net requiring 14,961.56 MB. MHAU-Net and GQAU-Net showed the highest memory demands at 16,151.36 MB and 16,461.01 MB, respectively. These findings indicated that although memory requirements increased with higher-resolution datasets, SAU-Net consistently maintained lower memory usage relative to the other attention-based models.

In terms of training complexity, SAU-Net demonstrated favorable efficiency when compared with existing approaches such as RMU-Net and HTTU-Net. The analysis showed that SAU-Net required 313.87 s for training, which was substantially lower than HTTU-Net (approximately 36,000 s) and RMU-Net (approximately 10,020 s). This comparison highlighted SAU-Net as a computationally efficient alternative while preserving strong segmentation performance.

Overall, the complexity analysis demonstrated that although all evaluated models exhibited considerable memory usage, SAU-Net consistently achieved lower build and prediction times compared to its counterparts. These results emphasize the importance of balancing segmentation performance with computational efficiency, particularly for clinical applications where rapid and resource-efficient processing is critical. The findings provide valuable insights into the operational characteristics of the evaluated models and inform future research and deployment decisions in brain tumor segmentation.

Model complexity versus accuracy

To further analyze the computational efficiency of the proposed SAU-Net model, segmentation accuracy was examined in relation to the complexity metrics reported earlier, including build time and memory usage (Table 8). Table 9 summarizes this relationship. SAU-Net exhibited the most favorable tradeoff between accuracy and computational complexity. Although its memory usage was slightly higher than that of ADAU-Net, SAU-Net achieved the highest Dice accuracy on both the BraTS 2018 and BraTS 2020 datasets while maintaining among the lowest build and prediction times.

Table 9.

Comparison of model complexity and segmentation accuracy for BraTS 2018 and 2020.

Model	Avg. Dice	Build time	Memory usage	Efficiency
SAU-Net	98.42	Low	Medium	Best accuracy-efficiency tradeoff
ADAU-Net	98.18	High	Medium–high	Slower with lower accuracy
MHAU-Net	97.31	Medium–high	High	Higher cost with lower accuracy
GQAU-Net	97.70	Medium–high	Highest	Most expensive with lower accuracy

BraTS: brain tumor segmentation; SAU-Net: self-attention U-Net; ADAU-Net: adaptive attention U-Net; MHAU-Net: multi-head attention U-Net; GQAU-Net: group query attention U-Net.

In contrast, MHAU-Net and GQAU-Net incurred higher computational costs while delivering comparatively lower segmentation performance. These findings demonstrate that SAU-Net provides an effective balance between segmentation accuracy and resource consumption, making it well-suited for clinical environments where both computational efficiency and high precision are critical.

Research question validation

RQ1: Effectiveness of SAU-Net compared to U-Net variants

The analysis presented in Table 3 confirmed that the proposed SAU-Net model significantly outperformed traditional U-Net variants, including ADAU-Net, MHAU-Net, and GQAU-Net. SAU-Net consistently achieved higher values across all key evaluation metrics, such as the Dice coefficient, Jaccard index, sensitivity, and specificity. For instance, SAU-Net achieved Dice coefficients of 98.23% on the BraTS 2018 dataset and 98.62% on the BraTS 2020 dataset, exceeding the corresponding scores of ADAU-Net (97.92% and 98.44%), MHAU-Net (96.92% and 97.70%), and GQAU-Net (98.00% and 97.40%). These results indicated that incorporating self-attention mechanisms into the U-Net architecture substantially improved segmentation accuracy. Consequently, the first research question was successfully validated.

RQ2: Comparison with SOA models

The comparative analyses presented in Tables 5 and 6 demonstrated that SAU-Net outperformed SOA segmentation methods in accurately delineating tumor subregions, including the WT, TC, and ET. On the BraTS 2018 dataset, SAU-Net achieved Dice coefficients of 98.16% for the WT, 98.87% for the TC, and 98.23% for the ET, which were substantially higher than those reported for leading approaches such as HTTU-Net and RMU-Net. Similarly, on the BraTS 2020 dataset, SAU-Net attained Dice coefficients of 98.99% for the WT, 98.70% for the TC, and 99.18% for the ET, significantly surpassing models including nnU-Net and HybriCSF. These consistent improvements across both datasets confirmed that SAU-Net effectively and accurately segmented critical tumor subregions, thereby validating the second research question.

RQ3: Impact of self-attention integration

The complexity analysis summarized in Table 8 demonstrated that SAU-Net achieved a favorable balance between segmentation accuracy and computational efficiency. On the BraTS 2018 dataset, SAU-Net recorded the most efficient build time (147.74 s) and prediction time (54.85 s) among the evaluated models, including ADAU-Net (225.25 s build time, 70.20 s prediction time), MHAU-Net (157.62 s build time, 55.38 s prediction time), and GQAU-Net (152.00 s build time, 55.39 s prediction time). Although SAU-Net exhibited memory usage comparable to other models, its superior processing speed combined with high segmentation performance indicated that the integrated self-attention mechanisms effectively enhanced computational efficiency without sacrificing accuracy. These findings therefore validated the third research question.

Hypothesis validation

The experimental findings provided compelling evidence that the proposed SAU-Net outperformed conventional U-Net models as well as other SOA techniques in BraTS. The integration of self-attention mechanisms within the U-Net architecture resulted in substantial improvements in segmentation accuracy, as demonstrated in Tables 3, 5, and 7. On the BraTS 2020 dataset, SAU-Net significantly surpassed existing models such as nnU-Net and HybriCSF, achieving Dice coefficients of 98.99% for the WT, 98.70% for the TC, and 99.18% for the ET.

Similarly, on the BraTS 2018 dataset, SAU-Net demonstrated consistently superior performance, attaining Dice coefficients of 98.16% for the WT, 98.87% for the TC, and 98.23% for the ET. The observed improvements across all evaluation metrics, including Dice coefficient, sensitivity, and specificity, were primarily attributed to the self-attention mechanism’s ability to emphasize diagnostically relevant features while preserving essential spatial context.

Furthermore, the complexity analysis presented in Table 8 supported the second component of the proposed hypothesis concerning computational efficiency. SAU-Net achieved lower build time (147.74 s) and prediction time (54.85 s) compared with alternative models such as ADAU-Net. This combination of high segmentation accuracy and reduced computational complexity confirmed the suitability of SAU-Net for clinical applications requiring both efficiency and precision.

Scientific implications for clinical practice

SAU-Net has important implications for clinical practice, particularly in the field of neuro-oncology. Accurate BraTS is critical for surgical planning, diagnosis, and therapeutic monitoring. By incorporating self-attention mechanisms, SAU-Net enables more precise delineation of tumor boundaries, especially in challenging regions such as the TC and ET. This increased precision can enhance radiologists’ confidence in tumor identification, leading to more informed and accurate treatment decisions.

Moreover, the computational efficiency of SAU-Net supports its potential integration into real-time clinical workflows, including resource-constrained healthcare environments. The utilization of multimodal MRI data provides anatomical insights, which further facilitate personalized treatment strategies and support preoperative and postoperative planning. By reducing manual intervention and accelerating the segmentation process, SAU-Net has the potential to improve patient outcomes and streamline clinical decision-making in brain tumor management, offering a practical and scalable solution for real-world deployment.

Conclusion

This study evaluated the performance of the proposed SAU-Net model for brain tumor segmentation using the BraTS 2018 and BraTS 2020 datasets and demonstrated that it consistently outperformed existing approaches. By effectively integrating context aggregation with advanced attention mechanisms, the model accurately identified tumor regions across diverse and complex cases.

On the BraTS 2018 dataset, SAU-Net achieved high sensitivity and specificity values of 98.87% and 99.05%, respectively, along with a Jaccard index of 96.31% and an average Dice coefficient of 98.16%. On the more challenging BraTS 2020 dataset, the model maintained strong performance, achieving a Jaccard index of 97.04% and an average Dice coefficient of 98.99%. These results indicated that SAU-Net provided superior tumor identification and segmentation accuracy compared to existing SOA methods.

The observed performance improvements suggest that SAU-Net can significantly enhance clinical diagnosis and treatment planning in neuro-oncology, thereby contributing to improved patient outcomes.

Despite these promising results, certain limitations remain, including the lack of investigation into advanced feature fusion and hybrid segmentation strategies. Future research will focus on developing hybrid segmentation frameworks and exploring feature fusion techniques to further enhance segmentation accuracy and model generalization.

Footnotes

ORCID iDs

Md. Alamin Talukder

Mehnaz Tabassum

Majdi Khalid

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent to publish

Not applicable.

Author contributions

Md. Alamin Talukder: Conceptualization, data curation, methodology, software, resource, visualization, formal analysis, supervision, validation, and writing—original draft, review, and editing. Mehnaz Tabassum: Methodology, software, resource, visualization, validation, investigation, and writing—review and editing. Majdi Khalid: Methodology, software, resource, visualization, validation, and investigation.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Availability of data and materials

The data that support the findings of this study are openly available in the BraTS 2018: https://www.med.upenn.edu/sbia/brats2018/data.html and BraTS 2020:

Guarantor

Md. Alamin Talukder.

References

Solanki

Singh

Chouhan

, et al. A systematic analysis of magnetic resonance images and deep learning methods used for diagnosis of brain tumor. Multimed Tools Appl 2024; 83: 23929–23966.

Batool

Byun

. Brain tumor detection with integrating traditional and computational intelligence approaches across diverse imaging modalities-challenges and future directions. Comput Biol Med 2024; 175: 108412.

Sowrirajan

Karumanan Srinivasan

Kalluri

, et al. Improved brain tumor segmentation using UNet-LSTM architecture. SN Comput Sci 2024; 5: 496.

Talukder

Islam

Uddin

, et al. A deep ensemble learning framework for brain tumor classification using data balancing and fine-tuning. Sci Rep 2025; 15: 35251.

Zuo

Fan

Tang

, et al. Deep learning-powered 3D segmentation derives factors associated with lymphovascular invasion and prognosis in clinical T1 stage non-small cell lung cancer. Heliyon 2023; 9: e15147.

Guo

. Novel robust automatic brain-tumor detection and segmentation using magnetic resonance imaging. IEEE Sens J 2024; 24: 10957–10964.

Priya

Vasudevan

. Brain tumor classification and detection via hybrid AlexNet-GRU based on deep learning. Biomed Signal Process Control 2024; 89: 105716.

Talukder

Layek

Hossain

, et al. ACU-Net: Attention-based convolutional U-Net model for segmenting brain tumors in fMRI images. Digital Health 2025; 11: 20552076251320288.

Pedada

Rao

Patro

, et al. A novel approach for brain tumour detection using deep learning based technique. Biomed Signal Process Control 2023; 82: 104549.

10.

Zhang

Muscat

. Trends in malignant and benign brain tumor incidence and mobile phone use in the US (2000–2021): A SEER-based study. Int J Environ Res Public Health 2025; 22: 933.

11.

Sailunaz

Alhajj

Özyer

, et al. A survey on brain tumor image analysis. Med Biol Eng Comput 2024; 62: 1–45.

12.

Mathur

Jaiswal

. Demystifying the role of artificial intelligence in neurodegenerative diseases. In: AI and neuro-degenerative diseases: Insights and solutions, 2024, pp.1–33. Springer, Cham, Switzerland.

13.

Kumar

Parthasarathy

. Development of an enhanced U-Net model for brain tumor segmentation with optimized architecture. Biomed Signal Process Control 2023; 81: 104427.

14.

Rasool

Bhat

. Unveiling the complexity of medical imaging through deep learning approaches. Chaos Theory Appl 2023; 5: 267–280.

15.

Shreeharsha

. Detection of brain tumor using hybridized 3D U-Net model on MRI images. Multimed Tools Appl 2025; 84: 30385–30413.

16.

Uppal

Prakash

. MS2ADM-BTS: Multi-scale dual attention guided diffusion model for volumetric brain tumor segmentation. Pattern Recognit Lett 2025; 198: 115–122.

17.

Nie

Yang

, et al. DiffBTS: A lightweight diffusion model for 3D multimodal brain tumor segmentation. Sensors 2025; 25: 2985.

18.

Tabassum

Di Ieva

Liu

. Meta transfer learning for brain tumor segmentation using nnUNet in meningioma and metastasis cases. Sci Rep 2025; 15: 37599.

19.

Saeed

Ali

Bin

, et al. RMU-Net: a novel residual mobile U-Net model for brain tumor segmentation from MR images. Electronics 2021; 10: 1962.

20.

Ali

Gilani

Waris

, et al. Brain tumour image segmentation using deep networks. IEEE Access 2020; 8: 153589.

21.

Tataei Sarshar

Ranjbarzadeh

Jafarzadeh Ghoushchi

, et al. Glioma brain tumor segmentation in four MRI modalities using a convolutional neural network and based on a transfer learning method. In: Brazilian technology symposium, pp. 386–402. Springer, Cham, Switzerland.

22.

Gull

Akbar

Khan

. Automated detection of brain tumor through magnetic resonance images using convolutional neural network. Biomed Res Int 2021; 2021: 3365043.

23.

Ullah

Ansari

Hanif

, et al. Brain MR image enhancement for tumor segmentation using 3D U-Net. Sensors 2021; 21: 7528.

24.

Zia

Baig

Rehman

, et al. Contextual information extraction in brain tumour segmentation. IET Image Process 2023; 17: 3371–3391.

25.

Sun

Peng

Guo

, et al. Segmentation of the multimodal brain tumor image used the multi-pathway architecture method based on 3D FCN. Neurocomputing 2021; 423: 34–45.

26.

Isensee

Jäger

Full

, et al. NNU-Net for brain tumor segmentation. In: Brainlesion: Glioma, multiple sclerosis, stroke and traumatic brain injuries: 6th international workshop, BrainLes 2020, held in conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, revised selected papers, part II 6, pp.118–132. Springer, Cham, Switzerland.

27.

Kataria

Panda

. HybridCSF model for magnetic resonance image based brain tumor segmentation. Indones J Electr Eng Comput Sci 2024; 35: 1845–1852.

28.

Magadza

Viriri

. Efficient NNU-Net for brain tumor segmentation. IEEE Access 2023; 11: 126386.

29.

Susanto

Tjandrasa

Fatichah

. Data augmentation using spatial transformation for brain tumor segmentation improvement. In: 2024 international seminar on intelligent technology and its applications (ISITIA), pp.639–644. IEEE, Piscataway, NJ, USA.

30.

Zhang

Yang

Huang

, et al. ME-Net: Multi-encoder Net framework for brain tumor segmentation. Int J Imag Syst Technol 2021; 31: 1834–1848.

31.

Angona

Mondal

MRH

. An attention based residual U-Net with Swin Transformer for brain MRI segmentation. Array 2025; 25: 100376.

32.

Lan

Sun

, et al. DPF-Unet: A CNN-swin transformer fusion network for 3D brain tumor segmentation in MRI images. J Supercomput 2025; 81: 832.

33.

Weninger

Rippel

Koppers

, et al. Segmentation of brain tumors and patient survival prediction: methods for the BraTS 2018 challenge. In: International MICCAI brainlesion workshop, pp.3–12. Springer, Cham, Switzerland.

34.

Henry

Carré

Lerousseau

, et al. Brain tumor segmentation with self-ensembled, deeply-supervised 3D U-Net neural networks: a BraTS 2020 challenge solution. In: Brainlesion: Glioma, multiple sclerosis, stroke and traumatic brain injuries: 6th international workshop, BrainLes 2020, held in conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, revised selected papers, part I 6, pp.327–339. Springer, Cham, Switzerland.

35.

Ahmad

Qamar

Shen

, et al. Context aware 3D UNet for brain tumor segmentation. In: Brainlesion: glioma, multiple sclerosis, stroke and traumatic brain injuries: 6th international workshop, BrainLes 2020, held in conjunction with MICCAI 2020, Lima, Peru, October 4, 2020, revised selected papers, part I 6, pp.207–218. Springer, Cham, Switzerland.

36.

Aboelenein

Songhao

Koubaa

, et al. HTTU-Net: Hybrid two track U-Net for automatic brain tumor segmentation. IEEE Access 2020; 8: 101406.

37.

Cao

Zhou

Zang

, et al. MBANet: A 3D convolutional neural network with multi-branch attention for brain tumor segmentation from MRI images. Biomed Signal Process Control 2023; 80: 104296.

Self-attention U-Net (SAU-Net): An attention-driven U-Net framework for precise brain tumor segmentation using multimodal magnetic resonance imaging

Abstract

Objectives

Methods

Results

Conclusion

Keywords

Introduction

Contributions

Research questions

Hypothesis

Related works

Related works on BraTS 2018

Related works on BraTS 2020

Methodology

Brain multimodal MRI data collection

Preprocessing

Volume slicing

Attention-based U-Net architecture

Mechanisms of attention enhancement

Brain tumor segmentation (BraTS)

Results analysis

Evaluation setup

Performance analysis

Performance visualization

Benefits and limitations of SAU-Net

Discussion

Complexity analysis

Model complexity versus accuracy

Research question validation

RQ1: Effectiveness of SAU-Net compared to U-Net variants

RQ2: Comparison with SOA models

RQ3: Impact of self-attention integration

Hypothesis validation

Scientific implications for clinical practice

Conclusion

Footnotes

ORCID iDs

Ethics approval

Consent to participate

Consent to publish

Author contributions

Funding

Declaration of conflicting interests

Availability of data and materials

Guarantor

References