Abstract
Remote sensing Change Detection (CD) has advanced significantly with the adoption of Convolutional Neural Networks (CNNs) and Transformers. While CNNs provide robust feature extraction capabilities, they are limited by their receptive field size, and Transformers are constrained by quadratic computational complexity when handling long sequences, impacting scalability. The Mamba architecture offers a compelling alternative with its linear complexity and high parallelism; however, its intrinsic 1D processing structure results in a loss of spatial information in 2D vision tasks. This paper proposes an efficient framework employing a Vision Mamba variant that enhances the ability to capture 2D spatial information while maintaining Mamba’s hallmark linear complexity. The framework utilizes a 2DMamba encoder to effectively learn global spatial contextual information from multi-temporal images. For feature fusion, we introduce a 2D scan-based, channel-parallel scanning strategy coupled with a spatio-temporal feature fusion method. This approach adeptly captures both local and global change information, addressing spatial discontinuity issues during fusion. In the decoding phase, we present a feature change flow-based decoding method that enhances the mapping of feature change information from low-resolution to high-resolution feature maps, thus mitigating feature shift and misalignment. Extensive experiments on benchmark datasets such as LEVIR-CD+ and WHU-CD demonstrate the competitive performance of our framework compared to state-of-the-art methods, highlighting the significant potential of Vision Mamba for efficient and accurate remote sensing change detection.
Keywords
Get full access to this article
View all access options for this article.
