Abstract
Utilizing a crack segmentation model based on a convolutional neural network (CNN) and Transformer for crack recognition has been a focal point recently in research on road damage identification. However, because of the limited global information processing capability of CNN models and the inadequate local feature recognition ability of Transformer models, the performance of the model in crack recognition under complex environments is suboptimal. Simultaneously, the challenges of larger model parameter sizes and lower computational efficiency impede progress in crack recognition tasks. Addressing these issues, this paper proposes a framework named Parallel Flatten Swin-VanillaNet (PFSV), which integrates Flatten Swin Transformer and VanillaNet. The framework employs upsampling to extract multiscale features from the intermediate layers of the encoder for decoding. The results demonstrate that, compared with DeepLabV3+, PSPNet, FPN, SETR, SegFormer, and DeepCrack, the PFSV model achieves improvements across all evaluation metrics. In addition, the number of parameters is reduced by 35.56% to 50.19%, and frames per second and floating-point operations per second values surpass those of the comparative models. The proposed PFSV model exhibits robust crack detection capabilities and superior computational efficiency.
Keywords
Get full access to this article
View all access options for this article.
