Abstract
Bridge inspection reports serve as valuable sources for assessing bridge condition and guiding maintenance, yet accurately extracting structured information from them remains a challenge. While advanced general-domain models and large language models (LLMs) have shown promise, they often fail to fully leverage unique inspection report text features such as specialized Chinese character properties and unique entity distribution patterns, while overlooking the existence of missed or misclassified entities. These failures lead to potential inaccuracies in critical entity extraction. To address these challenges, this paper proposed an NER model, MF-Attention-BiLSTM-CRF, that integrates Multi-Features (MF) fusion and a domain-specific correction method to accurately extract key information from bridge inspection reports. Unlike generic pre-trained architecture, this model introduces a relative position feature to capture the standardized reporting structure and a correction method to safeguard against the misclassification of vital structural defects. Furthermore, according to the characteristics of report texts, an improved Easy Data Augmentation (EDA) method was proposed to construct a bridge inspection report dataset with 23,192 characters and 2,752 entities. Experimental results demonstrate that the proposed model outperforms prior models, general domain models, and large language models, achieving an optimal F1 score of 89.3%. This work advances intelligent infrastructure management systems that strongly support intelligent early warning, predictive maintenance, and decision-making applications.
Keywords
Get full access to this article
View all access options for this article.
