Research on Bridge Surface Damage Classification Based on Improved Vision Transformer Model

Abstract

To improve the accuracy of surface damage classification for concrete bridges, this paper proposes an improved model—convolutional neural network-vision transformer (CNN-ViT). First, by replacing the original image block operation with a CNN, the model’s feature extraction capability is enhanced, allowing it to retain more critical information from the image. Second, the introduced local aggregation module dynamically focuses attention on the damaged area. By aggregating local features and fusing contextual information, it enhances feature learning and extraction in the damaged region, thereby improving the model’s accuracy and robustness in identifying fine damage in complex backgrounds. Finally, to verify the model’s effectiveness, ablation experiments were conducted, and its performance was compared with that of other neural network models. Experiment results show that the model achieves an accuracy of 98.7% in real-world concrete bridge surface damage identification, which is 10% higher than that of the original model. Compared with other neural network models, the combination of CNN and the local aggregation module effectively suppresses background noise interference and significantly improves the model’s overall performance, with higher detection accuracy and robustness.

Keywords

deep learning vision transformer bridge damage attention mechanism

Get full access to this article

View all access options for this article.

References

Y. F.

Guo

Z. Z.

Wang

Zhang

J. R.

Probabilistic Life Prediction for Reinforced Concrete Structures Subjected to Seasonal Corrosion-Fatigue Damage. Journal of Structural Engineering, Vol. 146, No. 7, 2020, p. 04020117.

Federal Highway Administration. Bridge Inspection Reference Manual (BIRM). U.S. Department of Transportation, Washington, D.C., 2012.

Sandeep

Kyle

Ayan

Miriam

A Systematic Review of Convolutional Neural Network-Based Structural Condition Assessment Techniques. Engineering Structures, Vol. 226, 2021, p. 111347.

Gagliardi

Bella

Sansonetti

Previti

Menghini

Automatic Damage Detection of Bridge Joints and Road Pavements by Artificial Neural Networks (ANNs). Proc., Earth Resources and Environmental Remote Sensing/GIS Applications XIII, Berlin, Germany. SPIE, 2022, p. 18.

Fan

C. L.

Detection of Multidamage to Reinforced Concrete Using Support Vector Machine-Based Clustering from Digital Images. Structural Control and Health Monitoring, Vol. 28, No. 12, 2021, p. e2841.

Liu

Q. W.

Zhao

S. M.

Qiao

W. T.

Ren

X. L.

Automatic Crack Recognition for Concrete Bridges Using a Fully Convolutional Neural Network and Naive Bayes Data Fusion Based on a Visual Detection System. Measurement Science and Technology, Vol. 31, No. 7, 2020, p. 075403.

Gomez-Cabrera

Escamilla-Ambrosio

P. J.

Review of Machine-Learning Techniques Applied to Structural Health Monitoring Systems for Building and Bridge Structures. Applied Sciences, Vol. 12, No. 21, 2022, p. 10754.

Zhu

J. S.

Zhang

H. D.

Z. Y.

Vision-Based Defects Detection for Bridges Using Transfer Learning and Convolutional Neural Networks. Structure and Infrastructure Engineering, Vol. 16, No. 9, 2019, pp. 1–13.

Zhang

Q. Y.

Barri

Babanajad

S. K.

Alavi

A. H.

Real-Time Detection of Cracks on Concrete Bridge Decks Using Deep Learning in the Frequency Domain. Engineering, Vol. 12, 2021, pp. 1786–1796.

10.

Wang

N. N.

Zhao

X. F.

Zhao

Zhang

Zou

J. P.

Automatic Damage Detection of Historic Masonry Buildings Based on Mobile Deep Learning. Automation in Construction, Vol. 103, 2019, pp. 53–66.

11.

Rubio

J. J.

Kashiwa

Laiteerapong

Deng

W. L.

Nagai

Escalera

Nakayama

Matsuo

Prendinger

Multi-Class Structural Damage Segmentation Using Fully Convolutional Networks. Computers in Industry, Vol. 103, 2019, pp. 53–66.

12.

L. F.

W. F.

Research on Detection Algorithm for Bridge Cracks Based on Deep Learning. Acta Automatica Sinica, Vol. 45, No. 9, 2019, pp. 1727–1742.

13.

S. L.

Zhang

D. Y.

Jin

Zhang

F. J.

Identification Framework for Cracks on a Steel Structure Surface by a Restricted Boltzmann Machines Algorithm Based on Consumer-Grade Camera Images. Structural Control and Health Monitoring, Vol. 25, No. 2, 2018, p. e2075.

14.

Zoubir

Rguig

El Aroussi

Chehri

Saadane

Jeon

Concrete Bridge Defects Identification and Localization Based on Classification Deep Convolutional Neural Networks and Transfer Learning. Remote Sensing, Vol. 14, No. 19, 2022, p. 4882.

15.

Huthwohl

Brilakis

Multi-Classifier for Reinforced Concrete Bridge Defects. Automation in Construction, Vol. 105, 2019, p. 102824.

16.

Ghosh Mondal

Jahanshahi

M. R.

R. T.

Z. Y.

Deep Learning-Based Multi-Class Damage Detection for Autonomous Post-Disaster Reconnaissance. Structural Control and Health Monitoring, Vol. 27, 2020, p. e2507.

17.

Vaswani

Shazeer

N. M.

Parmar

Uszkoreit

Jones

Gomez

A. N.

Kaiser

Polosukhin

Attention Is All You Need. Proc., 31st International Conference on Neural Information Processing Systems, Long Beach, CA, Curran Associates Inc., 2017, pp. 6000–6010.

18.

Xiong

C. Y.

Gao

Z. R.

Chen

W. Q.

Zheng

R. H.

Tian

J. W.

Image Super-Resolution with Channel-Attention-Embedded Transformer. Journal of Image and Graphics, Vol. 28, No. 12, 2023, pp. 3744–3757.

19.

Xiang

Guo

J. J.

Cao

Deng

A Crack-Segmentation Algorithm Fusing Transformers and Convolutional Neural Networks for Complex Detection Scenarios. Automation in Construction, Vol. 152, 2023, p. 104894.

20.

Wan

H. F.

Gao

Yuan

Z. D.

Sun

Q. R.

Cheng

Wang

R. B.

A Novel Transformer Model for Surface Damage Detection and Cognition of Concrete Bridges. Expert Systems with Applications, Vol. 213, No. Part B, 2023, p. 119019.

21.

Dosovitskiy

Beyer

Kolesnikov

Weissenborn

Zhai

X. H.

Unterthiner

Dehghani

, et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale. Proc., Conference on Computer Vision and Pattern Recognition, New York City, IEEE, 2021, pp. 45–67.

22.

Zhang

Huang

S. X.

Hua

Dong

C. R.

A Novel Image Classification Model Based on Depth-Wise Convolution Neural Network and Visual Transformer. Computer Science, Vol. 51, No. 2, 2023, pp. 196–204.

23.

Islam

Rahman

Ali

Mahim

S. M.

Miah

Enhancing Lung Abnormalities Diagnosis Using Hybrid DCNN-ViT-GRU Model with Explainable AI: A Deep Learning Approach. Image and Vision Computing, Vol. 142, 2024, p. 104918.

24.

Z. N.

L. K.

Vision Loop Closure Detection Algorithm Based on Vision Transformer Multi-Model Fusion. Laser Journal, Vol. 45, No. 6, 2024, pp. 75–81.

25.

Qin

S. J.

T. Y.

Deng

Huang

X. D.

Image Segmentation Using Vision Transformer for Tunnel Defect Assessment. Computer-Aided Civil and Infrastructure Engineering, Vol. 39, 2024, pp. 3243–3268.

26.

Zhang

Xie

Tang

Zhao

Shi

Wang

Xiang

High-Speed Railway Seismic Response Prediction Using CNN-LSTM Hybrid Neural Network. Journal of Civil Structural Health Monitoring, Vol. 14, 2024, pp. 1125–1139.

27.

Zhang

Zhao

Shao

Jiang

Zeng

Xiang

A Rapid Analysis Framework for Seismic Response Prediction and Running Safety Assessment of Train-Bridge Coupled Systems. Soil Dynamics and Earthquake Engineering, Vol. 177, 2024, p. 108386.

28.

Zhang

Xie

Zhao

Shao

Wang

Han

Pan

Xiang

Seismic Response Prediction Method of Train-Bridge Coupled System Based on Convolutional Neural Network-Bidirectional Long Short-Term Memory-Attention Modeling. Advances in Structural Engineering, Vol. 28, No. 2, 2025, pp. 341–357.

29.

Shao

Peng

Zhang

Liu

Chen

Yang

Xiang

An Intelligent GNN Seismic Response Prediction and Computation Framework Adhering to Meshless Principles: A Case Study for High-Speed Railway Bridges. Engineering Analysis with Boundary Elements, Vol. 179, No. Part B, 2025, p. 106359.

30.

Peng

Zhang

Huang

Xie

Liu

Chen

Xiang

Graph-Based Attention Model for Predictive Analysis in Train-Bridge Systems. Applied Soft Computing, Vol. 179, 2025, p. 113360.

31.

American Association of State Highway and Transportation Officials. Manual for Bridge Evaluation, 3rd ed. AASHTO, Washington, D.C., 2018.

32.

Zhuang

C. X.

Zhai

A. L.

Yamins

Local Aggregation for Unsupervised Learning of Visual Embeddings. Proc., International Conference on Computer Vision (ICCV), Seoul, Korea (South), IEEE, 2019, pp. 6001–6011.

33.

Chollet

Xception: Deep Learning with Depthwise Separable Convolutions. Proc., IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1800–1807.

34.

Chen

Lightweight Design of the Visual Transformer Network Structure. Master’s thesis, Hainan University, Haikou, China, 2023.

35.

Bukhsh

Z. A.

Anzlin

Stipanovic

BiNet: Bridge Visual Inspection Dataset and Approach for Damage Detection. Proc., International Conference on Quality Control of Bridges and Structures, Padua, Italy, Springer, 2021, pp. 1027–1034.

36.

U.S. Federal Highway Administration. National Bridge Inspection Standards (NBIS), 23 CFR Part 650, Subpart C. FHWA, U.S. Department of Transportation, Washington, D.C., 2022.

37.

Fujishima

Dang

Chun

P. J.

Training Images for Semantic Segmentation of Bridge Damage Detection. Japan Society of Civil Engineers, Dataset, 2023.

38.

Krizhevsky

Sutskever

Hinton

G. E.

ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, Vol. 60, No. 6, 2017, pp. 84–90.

39.

Szegedy

Liu

Jia

Y. Q.

Sermanet

Reed

Anguelov

Erhan

Vanhoucke

Rabinovich

Going Deeper with Convolutions. Proc., IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.

40.

Mehta

Rastegari

MobileViT: Light-Weight, General-Purpose, and Mobile-Friendly Vision Transformer. arXiv Preprint, arXiv:2110.02178, 2021.

41.

Liu

Lin

Y. T.

Cao

Wei

Y. X.

Zhang

Lin

Guo

B. N.

Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proc., International Conference on Computer Vision (ICCV), Montreal, BC, Canada, IEEE, 2021, pp. 9992–10002.

42.

Zhang

Zhou

X. Y.

Lin

M. X.

Sun

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. Proc., IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6848–6856.