Sage Journals: Discover world-class research

Abstract

Existing computer vision-based structural damage identification models demonstrate notable accuracy in categorizing and localizing damage. However, these models present several critical limitations that hinder their practical application in structural health monitoring. Primarily, their ability to recognize damage types remains constrained, preventing comprehensive analysis of the highly varied and complex conditions encountered in real-world structures. Second, these models lack linguistic capabilities, rendering them unable to articulate structural damage characteristics through natural language descriptions. With the continuous advancement of artificial intelligence, multimodal large language models (MLLMs) have emerged as a transformative solution, enabling the unified encoding and alignment of textual and visual data. These models can generate detailed Structural Damage Image descriptions while demonstrating robust generalization across diverse scenarios and tasks. This study introduces General Language Model for Structural Damage Identification (SDIGLM), an innovative MLLM for Structural Damage Identification, developed based on the open-source VisualGLM-6B architecture. To address the challenge of adapting MLLMs to the intricate and varied operating conditions in civil engineering, this work integrates a U-Net-based semantic segmentation module to generate defect segmentation maps as visual chain of thought (CoT). Additionally, a multiround dialogue fine-tuning dataset is constructed to enhance logical reasoning, complemented by a language CoT formed through prompt engineering. By leveraging this multimodal CoT, SDIGLM surpasses general-purpose MLLMs in structural damage identification, achieving an accuracy of 95.24% across various infrastructure types. Moreover, the model effectively describes damage characteristics such as hole size, crack direction, and corrosion severity.

Keywords

Large language model multimodal large language model multimodal chain of thought structural damage identification image description structural health monitoring

Get full access to this article

View all access options for this article.

References

Zhang

Peng

Wen

, et al. A review on concrete structural properties and damage evolution monitoring techniques. Sensors 2024; 24: 620.

Ataei

Adibnazari

Ataei

ST.

Data-driven detection and evaluation of damages in concrete structures: using deep learning and computer vision. arXiv preprint arXiv:250111836, 2025.

Huang

Zhang

, et al. Recovering compressed images for automatic crack segmentation using generative models. Mech Syst Signal Process 2021; 146: 107061.

Shaohua

Xihan

A review of concrete defect detection based on computer vision. J Inform Technol Civ Eng Architect 2023; 15: 14–21.

Andrushia

Anand

Neebha

, et al. Autonomous detection of concrete damage under fire conditions. Autom Construct 2022; 140: 104364.

Huang

Beck

. Bayesian system identification based on hierarchical sparse Bayesian learning and Gibbs sampling with application to structural damage assessment. Comput Methods Appl Mech Eng 2017; 318: 382–411.

Bhattacharya

Mandal

Puhan

NB.

Multi-deformation aware attention learning for concrete structural defect classification. IEEE Trans Circuits Syst Video Technol 2020; 31: 3707–3713.

Lawrence

Giles

Tsoi

, et al. Face recognition: a convolutional neural-network approach. IEEE Trans Neural Netw 1997; 8: 98–113.

Zhao

Kang

Concrete dam damage detection and localisation based on YOLOv5s-HSC and photogrammetric 3D reconstruction. Autom Construct 2022; 143: 104555.

10.

Zhang

Chang

Jamshidi

Concrete bridge surface damage detection using a single-stage detector. Comput-Aided Civ Infrastruct Eng 2020; 35: 389–409.

11.

Shen

A real-time detection approach for bridge cracks based on YOLOv4-FPM. Autom Constr 2021; 122: 103514.

12.

Ren

Huang

Hong

, et al. Image-based concrete crack detection in tunnels using deep fully convolutional networks. Constr Build Mater 2020; 234: 117367.

13.

Chaiyasarn

Buatik

Mohamad

, et al. Integrated pixel-level CNN-FCN crack detection via photogrammetric 3D texture mapping of concrete structures. Autom Constr 2022; 140: 104388.

14.

Zhao

Zhou

Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput Aid Civ Infrastruct Eng 2019; 34: 616–634.

15.

Gkioxari

Dollár

, et al. Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, Venice, Italy, 22–29 October 2017, pp. 2961–2969.

16.

Wei

Yao

Yang

, et al. Instance-level recognition and quantification for concrete surface bughole based on deep learning. Autom Constr 2019; 107: 102920.

17.

Yang

Shi

, et al. Automated inspection report generation using multimodal large language models and set-of-mark prompting. In: ISARC Proceedings of the international symposium on automation and robotics in construction, Lille, France, 3–7 June 2024, pp. 1003–1009. IAARC Publications.

18.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. Adv Neural Inform Process Syst 2017: 5998–6008.

19.

Zhao

Zhou

, et al. A survey of large language models. arXiv preprint arXiv:230318223, 2023.

20.

OpenAI. Introducing ChatGPT, https://openai.com/index/chatgpt (accessed 30 November 2022).

21.

Bavishi

Elsen

Hawthorne

, et al. Introducing our multimodal models, https://www.adept.ai/blog/fuyu-8b (2023, accessed 30 October 2024).

22.

Liu

, et al. Visual instruction tuning. Adv Neural Inform Process Syst 2024; 36: 34892–34916.

23.

Wang

, et al. CogVLM: visual expert for pretrained language models. arXiv preprint arXiv:231 103079, 2023.

24.

Savarese

, et al. Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In: International conference on machine learning, Hawaii, 23–29 July 2023, pp. 19730–19742. PMLR.

25.

Zhu

Chen

Shen

, et al. MiniGPT-4: enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:230410592, 2023.

26.

Fang

Wang

Xie

, et al. Eva: exploring the limits of masked visual representation learning at scale. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Vancouver, BC, Canada, 17–24 June 2023, pp. 19358–19369.

27.

Chiang

W-L

Lin

, et al. Vicuna: an open-source chatbot impressing GPT-4 with 90%* chatGPT quality, https://lmsys.org/blog/2023-03-30-vicuna (2023, accessed 20 October 2024).

28.

Han

Wang

, et al. CephGPT-4: an interactive multimodal cephalometric measurement and diagnostic system with visual large language model. arXiv preprint arXiv:230707518, 2023.

29.

Qian

Liu

, et al. GLM: general language model pretraining with autoregressive blank infilling. arXiv preprint arXiv:210310360, 2021.

30.

Ding

Yang

Hong

, et al. Cogview: mastering text-to-image generation via transformers. Adv Neural Inform Process Syst 2021; 34: 19822–19835.

31.

Wang

Yuan

, et al. Myriad: large multimodal model by applying vision experts for industrial anomaly detection. arXiv preprint arXiv:231019070, 2023.

32.

Chen

Bao

. Chatcivic: A domain-specific large language model (Llm) for design code interpretation. Available at SSRN: https://ssrn.com/abstract=5047555 or http://dx.doi.org/10.2139/ssrn.5047555

33.

Tongji University. CivilGPT, https://civilgpt.tongji.edu.cn/ (accessed 9 November 2024).

34.

Esser

Nousias

, et al. Text2BIM: generating building models using a large language model-based multi-agent framework. arXiv preprint arXiv:240808054, 2024.

35.

Liang

Hong

, et al. A survey of multimodel large language models. In: Proceedings of the 3rd international conference on computer, artificial intelligence and control engineering, Xi’an, China, 26–28 January 2024, pp. 405–409.

36.

Song

, et al. How to bridge the gap between modalities: a comprehensive survey on multimodal large language model. arXiv preprint arXiv:231107594, 2023.

37.

Wei

Wang

Schuurmans

, et al. Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inform Process Syst 2022; 35: 24824–24837.

38.

Zhang

, et al. Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:230200923, 2023.

39.

Liu

Chu

Zang

, et al. MMDU: a multi-turn multi-image dialog understanding benchmark and instruction-tuning dataset for LVLMs. arXiv preprint arXiv:240611833, 2024.

40.

Ding

Chen

, et al. Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:230514233, 2023.

41.

Dao

Ermon

, et al. Flashattention: fast and memory-efficient exact attention with IO-awareness. Adv Neural Inform Process Syst 2022; 35: 16344–16359.

42.

Shazeer

Fast transformer decoding: one write-head is all you need. arXiv preprint arXiv:191102150, 2019.

43.

Zhang

Yang

Zhang

, et al. Road crack detection using deep convolutional neural network. In: 2016 IEEE international conference on image processing (ICIP), Phoenix, Arizona, 25–28 September 2016, pp. 3708–3712. IEEE.

44.

Eisenbach

Stricker

Seichter

, et al. How to get pavement distress detection ready for deep learning? a systematic approach. In: 2017 international joint conference on neural networks (IJCNN), Anchorage, Alaska, 14–19 May 2017, pp. 2039–2047. IEEE.

45.

Shi

Cui

, et al. Automatic road crack detection using random structured forests. IEEE Trans Intell Transp Syst 2016; 17: 3434–3445.

46.

Guo

Feng

, et al. UDTIRI: an online open-source intelligent road inspection benchmark suite. IEEE Trans Intell Trans Syst 2023; 25: 9920–9931.

47.

Bao

Nagayama

, et al. The 1st international project competition for structural health monitoring (IPC-SHM, 2020): a summary and benchmark problem. Struct Health Monit 2021; 20: 2229–2239.

48.

Qiao

Bao

, et al. Pixel-level damage detection for concrete spalling and rebar corrosion based on U-net semantic segmentation. In: Yokota

Frangopol

(eds) Bridge maintenance, safety, management, life-cycle sustainability and innovations. 1st ed. Boca Raton: CRC Press, 2021, pp. 3319–3326.

49.

Yang

Shi

Chen

, et al. Deep convolution neural network-based transfer learning method for civil infrastructure crack detection. Autom Constr 2020; 116: 103199.

50.

Atha

Jahanshahi

MR.

Evaluation of deep learning approaches based on convolutional neural networks for corrosion detection. Struct Health Monitor 2018; 17: 1110–1128.

51.

Savino

Tondolo

Automated classification of civil structure defects based on convolutional neural network. Front Struct Civ Eng 2021; 15: 305–317.

SDIGLM: leveraging large language models and multimodal chain of thought for structural damage identification

Abstract

Keywords

Get full access to this article

References