Abstract
To improve the accuracy of surface damage classification for concrete bridges, this paper proposes an improved model—convolutional neural network-vision transformer (CNN-ViT). First, by replacing the original image block operation with a CNN, the model’s feature extraction capability is enhanced, allowing it to retain more critical information from the image. Second, the introduced local aggregation module dynamically focuses attention on the damaged area. By aggregating local features and fusing contextual information, it enhances feature learning and extraction in the damaged region, thereby improving the model’s accuracy and robustness in identifying fine damage in complex backgrounds. Finally, to verify the model’s effectiveness, ablation experiments were conducted, and its performance was compared with that of other neural network models. Experiment results show that the model achieves an accuracy of 98.7% in real-world concrete bridge surface damage identification, which is 10% higher than that of the original model. Compared with other neural network models, the combination of CNN and the local aggregation module effectively suppresses background noise interference and significantly improves the model’s overall performance, with higher detection accuracy and robustness.
Get full access to this article
View all access options for this article.
