SDM-UNet: A medical image segmentation network integrating multi-module collaborative architecture and feature enhancement

Abstract

(1) Medical image segmentation is crucial for disease diagnosis, surgical planning, and therapeutic monitoring, but existing methods face significant challenges due to the complex structures of human organs, including substantial size variations, indistinct boundaries, and low inter-tissue contrast. (2) To address this, we propose SDM-UNet, a hybrid network integrating CNN and Transformer modules to enhance segmentation performance. The architecture features a Multi-Attention Feature Refinement (MAFR) block replacing the Swin-UNet bottleneck, which combines adaptive kernel convolution, enhanced convolution, and channel attention to improve local feature extraction, and Multi-Fusion Dense Skip Connections that facilitate multi-scale feature fusion between the encoder and decoder to mitigate spatial information loss during downsampling. (3) Validated on the Synapse multi-organ CT and ACDC cardiac MRI datasets, SDM-UNet was trained using the PyTorch framework with ImageNet-pretrained weights and evaluated via Dice Similarity Coefficient (DSC) and 95th percentile Hausdorff Distance (HD95). (4) Experimental results show that SDM-UNet achieves an average DSC of 80.51% and HD95 of 22.09 mm on Synapse, and an average DSC of 90.58% and HD95 of 1.12 mm on ACDC, outperforming state-of-the-art methods like Swin-UNet and SCUNet++ and demonstrating its superiority in balancing global context understanding and local detail preservation.

Keywords

multi-fusion dense skip connection Multi-Attention Feature Refinement Block hybrid transformer medical image segmentation

Get full access to this article

View all access options for this article.

References

Tan

Zhou

Lin

, et al. A review of semantic medical image segmentation based on different paradigms. Int J Semantic Web Inf Syst 2024; 20(1): 1–25.

Liu

Song

Liu

, et al. A review of deep-learning-based medical image segmentation methods. Sustainability 2021; 13(3): 1224.

Ronneberger

Fischer

Brox

. U-Net: convolutional networks for biomedical image segmentation. arXiv 2015. https://arxiv.org/abs/1505.04597

Zhou

Siddiquee

MMR

Tajbakhsh

, et al. UNet++: A nested U-Net architecture for medical image segmentation. arXiv Preprint arXiv:1807.10165 2018.

Huang

Lin

Tong

, et al. A full-scale connected UNet for medical image segmentation. arXiv Preprint arXiv:2004.08790. 2020.

Diakogiannis

Waldner

Caccetta

, et al. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS Journal of Photogrammetry and Remote Sensing. 2020.

Oktyay

Schlemper

Folgoc

, et al. Attention U-Net: Learning where to look for the pancreas. arXiv Preprint arXiv:1804.03999. 2018.

Jin

, et al. Dense-U-net: dense encoder–decoder network for holographic imaging of 3D particle fields. Opt Commun 2021; 493: 126970.

Takikawa

Acuna

Jumpani

, et al. Gated-SCNN: Gated shape CNNs for semantic segmentation. arXiv Preprint arXiv:1907.05740. 2019.

10.

Manh

Jia

Xue

, et al. An efficient framework for lesion segmentation in ultrasound images using global adversarial learning and region-invariant loss. Computers in Biology and Medicine 2024; 171: 108137.

11.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. In Advances in Neural Information Processing Systems. 2017; 30: 5998–6008.

12.

Dosovitskiy

Beyer

Kolesnikov

, et al. An image is worth 16×16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations. 2021.

13.

Chen

, et al. TransUNet: Transformers make strong encoders for medical image segmentation. arXiv Preprint arXiv:2102.04306 2021.

14.

Liu

Lin

Cao

, et al. Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021: 10012–10022.

15.

Cao

Wang

Chen

, et al. Swin-Unet: Unet-like pure transformer for medical image segmentation. arXiv Preprint arXiv:2105.05537 2021.

16.

Huang

Bian

, et al. DS-UNet: a dual streams UNet for refined image forgery localization. Inf Sci 2022; 610: 73–89.

17.

Huang

Deng

, et al. MISSFormer: An effective medical image segmentation transformer. arXiv Preprint arXiv:2109.07162. 2021.

18.

Wang

, et al. H2Former: an efficient hierarchical hybrid Transformer for medical image segmentation. IEEE Trans Med Imag 2023; 42(9): 2763–2775.

19.

Chen

Zou

Guo

, et al. SCUNet++: Swin-UNet and CNN bottleneck hybrid architecture with multi-fusion dense skip connection for pulmonary embolism CT image segmentation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 2024.

20.

Ibtehaz

Rahman

. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Networks 2019.

21.

Alom

Hasan

Yakopcic

, et al. Recurrent residual convolutional neural network based on U-Net (R2U-Net) for medical image segmentation. arXiv Preprint arXiv:1802.06955. 2018.

22.

Touvron

Cord

Douze

, et al. Training data-efficient image transformers & distillation through attention. arXiv Preprint arXiv:2012.12877. 2020.

23.

Kirillov

Mintun

Ravi

, et al. Segment anything. arXiv Preprint arXiv:2304.02643. 2023.

24.

Hatamizadeh

Nath

Tang

, et al. Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. arXiv Preprint arXiv:2201.01266. 2022.

25.

Chen

Liu

Zhang

, et al. TransAttUnet: Multi-level attention-guided U-Net with transformer for medical image segmentation. arXiv Preprint arXiv:2107.05274. 2021.

26.

Zhang

Song

, et al. AKConv: Convolutional kernel with arbitrary sampled shapes and arbitrary number of parameters. arXiv Preprint arXiv:2311.11587. 2023.

27.

Wen

L. SCConv

: Spatial and channel reconstruction convolution for feature redundancy. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 6153–6162.

28.

Ouyang

Zhang

, et al. Efficient multi-scale attention module with crossspatial learning. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing. 2023.

29.

Woo

Park

Lee

J-Y

, et al. CBAM: Convolutional block attention module. In 2018 European Conference on Computer Vision. 2018.

30.

Sage Bionetworks . Cite data. nd. Retrieved from. https://help.synapse.org/docs/Cite-Data.1972601336.html

31.

Bernard

Lalande

Zotti

, et al.

Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved?

IEEE Transactions on Medical Imaging 2018; 37(11): 2514–2525.

32.

Milletari

Navab

Ahmadi

SA.

V-Net: Fully convolutional neural networks for volumetric medical image segmentation. arXiv Preprint arXiv:1606.04797 2016.

33.

Kim

Monroe

, et al. Quantitative evaluation of image segmentation incorporating medical consideration functions. Med Phys 2015; 42(6): 3013–3023.