Automatic crack segmentation network based on large kernel pooling transformer

Abstract

Automatic crack segmentation is crucial for ensuring the safe and stable operation of civil concrete buildings. However, due to the irregularity of cracks, low image quality, and complex background environment, automatic crack segmentation on concrete building surfaces still faces significant challenges. To address these issues, an automatic segmentation network (LKT-Net) based on a large kernel pooling Transformer is proposed, aiming to improve the comprehensiveness and accuracy of crack feature extraction while maintaining a lightweight design. First, the large kernel pooling Transformer (LKT) is proposed as the fundamental building block of LKT-Net, which combines large kernel convolution with pooling layers and attention mechanisms to effectively enhance global perception and capture local details at a lower computational cost. To extract edge information accurately, the Feedforward network is improved by integrating the Laplacian operator with multi-scale convolutions, thereby enhancing multiscale edge detection capabilities. Finally, to mitigate information loss during downsampling, we propose a feature enhancement module (FEM) to replace traditional skip-connections, thereby enhancing cross-level feature interactions. The experimental results showed that on three public datasets (DeepCrack537, CrackLS315, and CrackTree260), compared with eight advanced networks, LKT-Net achieved mean Intersection over Union (mIoU) scores of 86.23%, 70.82%, and 83.67%, respectively, demonstrating excellent segmentation performance. The codes are available at: https://github.com/wjxcsust2024/LKT-Net.

Keywords

crack segmentation large kernel convolution pyramid pooling multi-scale edge-enhanced feature enhancement module

Get full access to this article

View all access options for this article.

References

Adam

KDBJ

(2014) A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 1412(6).

Al-Huda

Peng

Algburi

RNA

, et al. (2023) Asymmetric dual-decoder-U-Net for pavement crack semantic segmentation. Automation in Construction 156: 105138. https://doi.org/10.1016/j.autcon.2023.105138

Badrinarayanan

Kendall

Cipolla

(2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(12): 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615

Bai

Yang

, et al. (2024) Pixel-wise crack defect segmentation with dual-encoder fusion network. Construction and Building Materials 426: 136179. https://doi.org/10.1016/j.conbuildmat.2024.136179

Chen

Zhu

Papandreou

, et al. (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018, pp. 801–818.

Ding

Zhang

Han

, et al. (2022) Scaling up your kernels to 31 x 31: revisiting large kernel design in cnns. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, New Orleans, LA, USA, 18 June 2022, pp. 11963–11975.

Feng

Wang

, et al. (2022) LKASR: large kernel attention for lightweight image super-resolution. Knowledge-Based Systems 252: 109376. https://doi.org/10.1016/j.knosys.2022.109376

Gao

, et al. (2023) CTCNet: a CNN-transformer cooperation network for face image super-resolution. IEEE Transactions on Image Processing 32: 1978–1991. https://doi.org/10.1109/TIP.2023.3261747

Guo

Liu

, et al. (2023a) A novel transformer-based network with attention mechanism for automatic pavement crack detection. Construction and Building Materials 391: 131852. https://doi.org/10.1016/j.conbuildmat.2023.131852

10.

Guo

Liu

, et al. (2023b) Visual attention network. Computational Visual Media 9(4): 733–752. https://doi.org/10.1007/s41095-023-0364-2

11.

Han

Huyan

, et al. (2021) CrackW-Net: a novel pavement crack image segmentation convolutional neural network. IEEE Transactions on Intelligent Transportation Systems 23(11): 22135–22144. https://doi.org/10.1109/tits.2021.3095507

12.

Han

Yang

(2024) Enhancing pixel-level crack segmentation with visual mamba and convolutional networks. Automation in Construction 168: 105770. https://doi.org/10.1016/j.autcon.2024.105770

13.

Hsieh

Tsai

YCJ

(2021) Dau-net: dense attention u-net for pavement crack segmentation. In: 2021 IEEE international intelligent transportation systems conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021. IEEE, pp. 2251–2256.

14.

Lau

Rehman

YAU

(2024) Large separable kernel attention: rethinking the large kernel attention design in cnn. Expert Systems with Applications 236: 121352. https://doi.org/10.1016/j.eswa.2023.121352

15.

Fan

, et al. (2022) Mvitv2: improved multiscale vision transformers for classification and detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, New Orleans, LA, USA, 18–22 June 2022, pp. 4804–4814.

16.

Zhang

Zhu

, et al. (2024) Automatic crack detection on concrete and asphalt surfaces using semantic segmentation network with hierarchical transformer. Engineering Structures 307: 117903.

17.

Liu

Yao

, et al. (2019) DeepCrack: a deep hierarchical feature learning architecture for crack segmentation. Neurocomputing 338: 139–153. https://doi.org/10.1016/j.neucom.2019.01.036

18.

Liu

Zhu

Xia

, et al. (2022a) FFEDN: feature fusion encoder decoder network for crack detection. IEEE Transactions on Intelligent Transportation Systems 23(9): 15546–15557. https://doi.org/10.1109/tits.2022.3141827

19.

Liu

Mao

, et al. (2022b) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, New Orleans, LA, USA, 18–22 June 2022, pp. 11976–11986.

20.

Lou

Zhang

Zhou

, et al. (2025) TransXNet: learning both global and local dynamics with a dual dynamic token mixer for visual recognition. IEEE Transactions on Neural Networks and Learning Systems 36(6): 11534–11547.

21.

Meng

, et al. (2023) Detail-semantic guide network based on spatial attention for surface defect detection with fewer samples. Applied Intelligence 53(6): 7022–7040. https://doi.org/10.1007/s10489-022-03671-5

22.

Moon

Choi

Kim

, et al. (2024) Pctc-net: a crack segmentation network with parallel dual encoder network fusing pre-conv-based transformers and convolutional neural networks. Sensors 24(5): 1467. https://doi.org/10.3390/s24051467

23.

Pang

Zhang

Zhao

, et al. (2022) DcsNet: a real-time deep network for crack segmentation. Signal, Image and Video Processing 16(4): 911–919. https://doi.org/10.1007/s11760-021-02034-w

24.

Wang

, et al. (2022) A method of hierarchical feature fusion and connected attention architecture for pavement crack detection. IEEE Transactions on Intelligent Transportation Systems 23(9): 16038–16047. https://doi.org/10.1109/tits.2022.3147669

25.

Quan

Wang

(2023) CrackViT: a unified CNN-transformer model for pixel-level crack extraction. Neural Computing & Applications 35(15): 10957–10973. https://doi.org/10.1007/s00521-023-08277-7

26.

Ren

Huang

Hong

, et al. (2020) Image-based concrete crack detection in tunnels using deep fully convolutional networks. Construction and Building Materials 234: 117367. https://doi.org/10.1016/j.conbuildmat.2019.117367

27.

She

Sun

, et al. (2024) LUCF-Net: lightweight U-shaped cascade fusion network for medical image segmentation. IEEE Journal of Biomedical and Health Informatics 29(3): 2088–2099.

28.

Vaswani

Shazeer

Parmar

, et al. (2017) Attention is all you need. Advances in Neural Information Processing Systems 30: 1.

29.

Wang

(2022) Automatic concrete crack segmentation model based on transformer. Automation in Construction 139: 104275. https://doi.org/10.1016/j.autcon.2022.104275

30.

Wang

(2025a) Optimizing concrete surface defect detection with adaptive supervision and scribble annotations. Advanced Engineering Informatics 68: 103669. https://doi.org/10.1016/j.aei.2025.103669

31.

Wang

(2025b) Transformer‐based crack segmentation for concrete structures in complex scenarios. Structural Concrete 27(1): 474–491.

32.

Wang

Xie

, et al. (2022) Pvt v2: improved baselines with pyramid vision transformer. Computational Visual Media 8(3): 415–424.

33.

Wang

Chen

, et al. (2024a) Lkm-unet: large kernel vision mamba unet for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Cham, 6–10 October 2024: Springer Nature Switzerland, pp. 360–370.

34.

Wang

Zeng

Sharma

, et al. (2024b) Dual-path network combining CNN and transformer for pavement crack segmentation. Automation in Construction 158: 105217. https://doi.org/10.1016/j.autcon.2023.105217

35.

Wang

Zou

Alfarraj

, et al. (2024c) Image super-resolution method based on the interactive fusion of transformer and CNN features. The Visual Computer 40(8): 5827–5839. https://doi.org/10.1007/s00371-023-03138-9

36.

Wang

Yao

, et al. (2025) Dual-encoder network for pavement concrete crack segmentation with multi-stage supervision. Automation in Construction 169: 105884. https://doi.org/10.1016/j.autcon.2024.105884

37.

Jia

Wang

(2022a) TMCrack-Net: a U-shaped network with a feature pyramid and transformer for mural crack segmentation. Applied Sciences 12(21): 10940. https://doi.org/10.3390/app122110940

38.

Liu

Zhan

, et al. (2022b) P2T: pyramid pooling transformer for scene understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(11): 12760–12771. https://doi.org/10.1109/TPAMI.2022.3202765

39.

Xiang

Guo

Cao

, et al. (2023) A crack-segmentation algorithm fusing transformers and convolutional neural networks for complex detection scenarios. Automation in Construction 152: 104894. https://doi.org/10.1016/j.autcon.2023.104894

40.

Tang

Wang

, et al. (2023) Assessment of geometric parameters of segmented crack on concrete building facade using deep learning. Structures 57: 105188. Elsevier. https://doi.org/10.1016/j.istruc.2023.105188

41.

Yan

Zhu

, et al. (2022) CycleADC-Net: a crack segmentation method based on multi-scale feature fusion. Measurement 204: 112107. https://doi.org/10.1016/j.measurement.2022.112107

42.

Yan

Yang

Chen

, et al. (2025) Multi-scale convolutional attention frequency-enhanced transformer network for medical image segmentation. Information Fusion 119: 103019. https://doi.org/10.1016/j.inffus.2025.103019

43.

Yang

Bai

Liu

, et al. (2023) Multi-scale triple-attention network for pixelwise crack segmentation. Automation in Construction 150: 104853. https://doi.org/10.1016/j.autcon.2023.104853

44.

Yang

Qiu

Zhang

, et al. (2026) D-net: dynamic large kernel with dynamic feature fusion for volumetric medical image segmentation. Biomedical Signal Processing and Control 113: 108837. https://doi.org/10.1016/j.bspc.2025.108837

45.

Wang

Peng

, et al. (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018, pp. 325–341.

46.

Chen

Shen

, et al. (2024) Robust pavement crack segmentation network based on transformer and dual-branch decoder. Construction and Building Materials 453: 139026. https://doi.org/10.1016/j.conbuildmat.2024.139026

47.

Zhang

Zeng

Zhang

(2021) Edge-oriented convolution block for real-time super resolution on mobile devices. In: Proceedings of the 29th ACM international conference on multimedia, pp. 4034–4043.

48.

Zhang

Peng

Zhao

(2025) An ultra-lightweight network combining mamba and frequency-domain feature extraction for pavement tiny-crack segmentation. Expert Systems with Applications 264: 125941. https://doi.org/10.1016/j.eswa.2024.125941

49.

Zhao

Shi

, et al. (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017, pp. 2881–2890.

50.

Zhou

Song

(2020) Deep learning-based roadway crack classification using laser-scanned range images: a comparative study on hyperparameter selection. Automation in Construction 114: 103171. https://doi.org/10.1016/j.autcon.2020.103171

51.

Zhou

Zhang

Gong

(2023) Hybrid semantic segmentation for tunnel lining cracks based on swin transformer and convolutional neural network. Computer-Aided Civil and Infrastructure Engineering 38(17): 2491–2510. https://doi.org/10.1111/mice.13003

52.

Zou

Cao

, et al. (2012) CrackTree: automatic crack detection from pavement images. Pattern Recognition Letters 33(3): 227–238. https://doi.org/10.1016/j.patrec.2011.11.004

53.

Zou

Zhang

, et al. (2018) Deepcrack: learning hierarchical convolutional features for crack detection. IEEE Transactions on Image Processing 28(3): 1498–1512. https://doi.org/10.1109/tip.2018.2878966