Abstract
Accurate medical image segmentation models can efficiently assist healthcare professionals in diagnosis. Segmentation methods based on Convolutional Neural Networks (CNNs) are effective in extracting local features. However, due to their inherently limited receptive fields, they exhibit shortcomings in integrating global dependencies and extracting multi-scale features. This limitation of CNNs has prompted researchers to explore Transformer-based approaches. This shift is driven by the unique self-attention mechanism of Transformers, which effectively models global dependencies and facilitates multi-scale feature extraction. In this study, we designed a medical image segmentation model called FRISFormer, based on a U-shaped architecture. FRISFormer is entirely built on the Transformer architecture and possesses the characteristic of being effectively trainable without the need for pre-trained models. Specifically, the innovation of FRISFormer is primarily reflected in two key aspects: (1) FRISFormer refines the features extracted by the Efficient Self-Attention (ESA) module through a Feature Refinement Feed-forward Network (FRFN), further achieving deep deconstruction and enhancement of features. (2) FRISFormer replaces the classic skip connections with a ReMixed Transformer Context Bridge, effectively promoting the correlation between global dependencies and local context. This study tested FRISFormer on the multi-organ segmentation dataset (Synapse) and the skin lesion segmentation dataset (ISIC 2018). On the Synapse dataset, FRISFormer improved the test metric by 0.50, while on the ISIC dataset, the test metric improved by 0.23. The experimental results fully demonstrate the effectiveness and superiority of FRISFormer in feature representation and segmentation accuracy.
Get full access to this article
View all access options for this article.
