Abstract
Traffic sign detection is a fundamental component of intelligent transportation systems, yet remains challenging due to the small size of signs, visual occlusions, and complex environmental conditions. In this paper, we propose a novel YOLO-based architecture enhanced with multi-scale attention and Transformer modules to address these limitations. Specifically, a Convolutional Block Attention Module (CBAM) is employed to refine spatial and channel-wise features, while a C3 Transformer (C3TR) module introduces multi-head self-attention to capture global contextual information. The proposed enhancements significantly improve the model's ability to detect small and visually degraded traffic signs. Evaluated on the German Traffic Sign Detection Benchmark (GTSDB), our model achieves a mAP@0.5 of 96.75%, mAP@0.5:0.95 of 81.18%, precision of 97.05%, and recall of 95.07%. Compared to YOLOv5 s, this reflects relative gains of +11.2% in mAP@0.5, + 26.6% in mAP@0.5:0.95, + 1.6% in precision, and +20.0% in recall, with a 41.9% reduction in model size. It also outperforms YOLOv8, YOLOv7-tiny, and Faster R-CNN, particularly for degraded signs. For real-time deployment on embedded systems, the model is optimized using NVIDIA TensorRT. This optimization significantly reduces inference latency and computational load while preserving high detection accuracy, making the model well-suited for ADAS and autonomous driving applications.
Keywords
Get full access to this article
View all access options for this article.
