Semantic Segmentation Model for Road Cracks Based on Parallel Flatten Swin-VanillaNet Framework

Abstract

Utilizing a crack segmentation model based on a convolutional neural network (CNN) and Transformer for crack recognition has been a focal point recently in research on road damage identification. However, because of the limited global information processing capability of CNN models and the inadequate local feature recognition ability of Transformer models, the performance of the model in crack recognition under complex environments is suboptimal. Simultaneously, the challenges of larger model parameter sizes and lower computational efficiency impede progress in crack recognition tasks. Addressing these issues, this paper proposes a framework named Parallel Flatten Swin-VanillaNet (PFSV), which integrates Flatten Swin Transformer and VanillaNet. The framework employs upsampling to extract multiscale features from the intermediate layers of the encoder for decoding. The results demonstrate that, compared with DeepLabV3+, PSPNet, FPN, SETR, SegFormer, and DeepCrack, the PFSV model achieves improvements across all evaluation metrics. In addition, the number of parameters is reduced by 35.56% to 50.19%, and frames per second and floating-point operations per second values surpass those of the comparative models. The proposed PFSV model exhibits robust crack detection capabilities and superior computational efficiency.

Keywords

road cracks semantic segmentation transformer convolutional neural networks Parallel Flatten Swin-VanillaNet

Get full access to this article

View all access options for this article.

References

Sun

Kamaliardakani

Zhang

Weighted Neighborhood Pixels Segmentation Method for Automated Detection of Cracks on Pavement Surface Images. Journal of Computing in Civil Engineering, Vol. 30, 2016, p. 04015021.

Canestrari

Ingrassia

L. P.

A Review of Top-Down Cracking in Asphalt Pavements: Causes, Models, Experimental Tools and Future Challenges. Journal of Traffic and Transportation Engineering, Vol. 7, 2020, pp. 541–572.

Zhang

A. A.

Wang

K. C. P.

Fei

Liu

Chen

Yang

J. Q.

Yang

Qiu

Automated Pixel-Level Pavement Crack Detection on 3D Asphalt Surfaces with a Recurrent Neural Network. Computer-Aided Civil and Infrastructure Engineering, Vol. 34, 2018, pp. 213–229.

Tong

Gao

Han

Wang

Recognition of Asphalt Pavement Crack Length Using Deep Convolutional Neural Networks. Road Materials and Pavement Design, Vol. 19, 2018, pp. 1334–1349.

Qingbo

Pavement Crack Detection Algorithm Based on Image Processing Analysis. Proc., 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, China, IEEE, New York, NY, Vol. 1, 2016, pp. 15–18.

Salman

Mathavan

Kamal

Rahman

Pavement Crack Detection Using the Gabor Filter. Proc., 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), The Hague, Netherlands, IEEE, New York, NY, 2013, pp. 2039–2044.

Fernández

A. C.

Rodríguez-Lozano

F. J.

Villatoro

Olivares

Palomares

J. M.

Efficient Pavement Crack Detection and Classification. EURASIP Journal on Image and Video Processing, Vol. 2017, 2017, pp. 1–11.

Huang

Liu

Sun

A Pavement Crack Detection Method Combining 2D with 3D Information Based on Dempster-Shafer Theory. Computer-Aided Civil and Infrastructure Engineering, Vol. 29, 2014, pp. 299–313.

Hoang

N.-D.

Nguyen

Q.-L.

A Novel Method for Asphalt Pavement Crack Classification Based on Image Processing and Machine Learning. Engineering with Computers, Vol. 35, 2019, pp. 487–498.

10.

Spencer

B. F.

Hoskere

Narazaki

Advances in Computer Vision-Based Civil Infrastructure Inspection and Monitoring. Engineering, Vol. 5, No. 2, 2019, pp. 199–222.

11.

Krizhevsky

Sutskever

Hinton

G. E.

ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, Vol. 60. 2012, 84–90.

12.

Fan

Automatic Pavement Crack Detection Based on Structured Prediction with the Convolutional Neural Network. ArXiv abs/1802.02208, 2018.

13.

Wang

K. C. P.

Zhang

A. A.

Yang

Wang

Automatic Classification of Pavement Crack Using Deep Convolutional Neural Network. International Journal of Pavement Engineering, Vol. 21, 2018, pp. 457–463.

14.

Song

Jia

Zhu

Automatic Pavement Crack Detection and Classification Using Multiscale Feature Attention Network. IEEE Access, Vol. 7, 2019, pp. 171001–171012.

15.

Han

Huyan

Huang

Zhang

CrackW-Net: A Novel Pavement Crack Image Segmentation Convolutional Neural Network. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 2022, pp. 22135–22144.

16.

Sun

Xie

Jiang

Cao

Liu

DMA-Net: DeepLab with Multi-Scale Attention for Pavement Crack Segmentation. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 2022, pp. 18392–18403.

17.

Yang

Zhang

Prokhorov

D. V.

Mei

Ling

Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection. IEEE Transactions on Intelligent Transportation Systems, Vol. 21, 2019, pp. 1525–1535.

18.

Han

Zheng

Chen

Huang

Semantic Segmentation Model for Concrete Cracks Based on Parallel Swin-CNNs Framework. Structural Health Monitoring, Vol. 23, No. 6, 2024, pp. 3731–3747.

19.

Zheng

Chen

Wang

Chen

Huang

Jiang

Knowledge Distillation with T-Seg Guiding for Lightweight Automated Crack Segmentation. Automation in Construction, Vol. 166, 2024, p. 105585.

20.

Han

Guo

Automatic Classification of Ligneous Leaf Diseases via Hierarchical Vision Transformer and Transfer Learning. Frontiers in Plant Science, Vol. 14, 2024, p. 1328952.

21.

Vaswani

Shazeer

N. M.

Parmar

Uszkoreit

Jones

Gomez

A. N.

Kaiser

Polosukhin

Attention is All you Need. In Advances in Neural Information Processing Systems 30, 31st Annual Conference on Neural Information Processing Systems (NIPS 2017) (I. Guyon, U. Von Luxburg, S. Bengio, et al., eds.), Held 4-9 December 2017, Long Beach, CA, 2017, pp. 6000–6010.

22.

Dosovitskiy

Beyer

Kolesnikov

Weissenborn

Zhai

Unterthiner

Dehghani

, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv abs/2010.11929, 2020.

23.

Yuan

Chen

Wang

Shi

Tay

F. E. H.

Feng

Yan

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet. Proc., IEEE/CVF International Conference on Computer Vision (ICCV), Virtual Conferenc, 2021, pp. 538–547. IEEE.

24.

Wang

Automatic Concrete Crack Segmentation Model Based on Transformer. Automation in Construction, Vol. 139, 2022, p. 104275.

25.

Asadi Shamsabadi

Rao

A. S.

Nguyen

Ngo

Dias-da-Costa

Vision Transformer-Based Autonomous Crack Detection on Asphalt and Concrete Surfaces. Automation in Construction, Vol. 140, 2022, p. 104316.

26.

Xiao

Shang

Lin

Q. S.

Zhang

Pavement Crack Detection with Hybrid-Window Attentive Vision Transformers. International Journal of Applied Earth Observation and Geoinformation, Vol. 116, 2023, p. 103172.

27.

Wang

Zhao

Wang

Zheng

Chen

Pavement Crack Image Acquisition Methods and Crack Extraction Algorithms: A Review. Journal of Traffic and Transportation Engineering, Vol. 6, 2019, pp. 535–556.

28.

Qiu

Tang

Yang

Wan

Lin

Zha

Machine Vision-Based Autonomous Road Hazard Avoidance System for Self-Driving Vehicles. Scientific Reports, Vol. 14, No. 1, 2024, p. 12178.

29.

Truong

L. N. H.

Mora

O. E.

Cheng

Tang

Singh

Deep Learning to Detect Road Distress from Unmanned Aerial System Imagery. Transportation Research Record: Journal of the Transportation Research Board, 2021. 2675: 776–788.

30.

Ghosh

Smadi

Automated Detection and Classification of Pavement Distresses Using 3D Pavement Surface Images and Deep Learning. Transportation Research Record: Journal of the Transportation Research Board, 2021. 2675: 1359–1374.

31.

Han

Pan

Han

Song

Huang

FLatten Transformer: Vision Transformer using Focused Linear Attention. ArXiv abs/2308.00442, 2023.

32.

Liu

Lin

Cao

Wei

Zhang

Lin

Guo

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Proc., IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021, pp. 9992–10002. IEEE.

33.

Chen

Wang

Guo

Tao

VanillaNet: The Power of Minimalism in Deep Learning. ArXiv abs/2305.12972, 2023.

34.

Agarap

A. F.

Deep Learning using Rectified Linear Units (ReLU). ArXiv abs/1803.08375, 2018.

35.

Zhang

Ren

Sun

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. Proc., IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1026–1034. IEEE.

36.

Hendrycks

Gimpel

Gaussian Error Linear Units (GELUs). arXiv: Learning, 2016.

37.

Chen

Luo

Adeli

Wang

Yuille

A. L.

Zhou

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. ArXiv abs/2102.04306, 2021.

38.

Shorten

Khoshgoftaar

T. M.

A Survey on Image Data Augmentation for Deep Learning. Journal of Big Data, Vol. 6, 2019, pp. 1–48.

39.

Ioffe

Szegedy

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proc., International Conference on Machine Learning, Lille, France, 2015.

40.

Özgenel

Ç. F

. Concrete Crack Segmentation Dataset. 2019. https://www.kaggle.com/datasets/lakshaymiddha/crack-segmentation-dataset

41.

Shi

Cui

Meng

Chen

Automatic Road Crack Detection Using Random Structured Forests. IEEE Transactions on Intelligent Transportation Systems, Vol. 17, 2016, pp. 3434–3445.

42.

Zou

Zhang

Wang

DeepCrack: Learning Hierarchical Convolutional Features for Crack Detection. IEEE Transactions on Image Processing, Vol. 28, 2019, pp. 1498–1512.

43.

Cortes

Mohri

Rostamizadeh

L2 Regularization for Learning Kernels. Proc., Conference on Uncertainty in Artificial Intelligence, Montreal Quebec Canada, AUAI Press, Arlington, VA, 2009.

44.

Loshchilov

Hutter

Decoupled Weight Decay Regularization. Proc., International Conference on Learning Representations, Palais des Congrès Neptune, Toulon, France, 2017.

45.

Sudre

C. H.

Vercauteren

T. K. M.

Ourselin

Cardoso

M. J.

Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations. Proc., Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, held in conjunction with MICCAI 2017, Quebec City, QC, Springer, Cham, Switzerland, 2017, pp. 240–248.

46.

Tan

Sun

Kong

Zhang

Yang

Liu

A Survey on Deep Transfer Learning. In 27th International Conference on Artificial Neural Networks (Kůrková V, Manolopoulos Y, Hammer B, et al., eds.), Rhodes, Greece, October 4-7, 2018. Springer Cham.

47.

Chen

L.-C.

Papandreou

Kokkinos

Murphy

K. P.

Yuille

A. L.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 2016, pp. 834–848.

48.

Zhao

Shi

Wang

Jia

Pyramid Scene Parsing Network. Proc., IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 6230–6239. IEEE.

49.

Lin

T.-Y.

Dollár

Girshick

R. B.

Hariharan

Belongie

S. J.

Feature Pyramid Networks for Object Detection. Proc., IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 936–944. IEEE.

50.

Zheng

Zhao

Zhu

Luo

Wang

, et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, 2021, pp. 6877–6886. IEEE.

51.

Xie

Wang

Anandkumar

Álvarez

J. M.

Luo

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. ArXiv abs/2105.15203, 2021.