Abstract
This study aims to develop a robust and accurate framework for classifying wood surface defects in industrial settings, where traditional methods struggle with variability and overlapping textures. The proposed approach combines YOLOv11 for precise defect detection, a Graph Attention Network (GAT) for capturing spatial relationships among detected regions, and a Transformer-based classification head to integrate global contextual information (YOLO-GATFormer). Experimental evaluations on a real-world dataset containing over 21,258 wood surface images demonstrate the effectiveness of this hybrid architecture. The proposed model achieves a classification accuracy of 93.0% and an F1-score of 91.8%, outperforming YOLOv5-based baselines by up to +5.1% in F1-score. The model also exhibits high class-wise performance, with F1-scores reaching 92.7% for Sound knot, 91.9% for Wormhole, and 91.8% for Knot with crack, and maintains robustness across 10 defect classes. These results confirm the benefit of combining local detection, relational reasoning, and global aggregation, offering a scalable and interpretable solution for intelligent visual inspection in industrial applications.
Get full access to this article
View all access options for this article.
