Abstract
Existing computer vision-based structural damage identification models demonstrate notable accuracy in categorizing and localizing damage. However, these models present several critical limitations that hinder their practical application in structural health monitoring. Primarily, their ability to recognize damage types remains constrained, preventing comprehensive analysis of the highly varied and complex conditions encountered in real-world structures. Second, these models lack linguistic capabilities, rendering them unable to articulate structural damage characteristics through natural language descriptions. With the continuous advancement of artificial intelligence, multimodal large language models (MLLMs) have emerged as a transformative solution, enabling the unified encoding and alignment of textual and visual data. These models can generate detailed Structural Damage Image descriptions while demonstrating robust generalization across diverse scenarios and tasks. This study introduces General Language Model for Structural Damage Identification (SDIGLM), an innovative MLLM for Structural Damage Identification, developed based on the open-source VisualGLM-6B architecture. To address the challenge of adapting MLLMs to the intricate and varied operating conditions in civil engineering, this work integrates a U-Net-based semantic segmentation module to generate defect segmentation maps as visual chain of thought (CoT). Additionally, a multiround dialogue fine-tuning dataset is constructed to enhance logical reasoning, complemented by a language CoT formed through prompt engineering. By leveraging this multimodal CoT, SDIGLM surpasses general-purpose MLLMs in structural damage identification, achieving an accuracy of 95.24% across various infrastructure types. Moreover, the model effectively describes damage characteristics such as hole size, crack direction, and corrosion severity.
Keywords
Get full access to this article
View all access options for this article.
