Abstract
As intelligent transportation systems increasingly rely on vehicle detection algorithms, their actual deployment faces significant challenges. The high computational complexity of deep learning models conflicts with resource-constrained edge devices and their real-time requirements. The detection system must maintain a stable frame rate under low-latency conditions in high-traffic environments to ensure timely traffic monitoring and decision-making, as well as adaptability to complex environments. Poor weather, changes in lighting, and intricate backgrounds can affect detection accuracy, leading to false or missed detections. Consequently, this study proposes a hardware-oriented lightweight vehicle detection framework that optimizes computational efficiency while ensuring detection accuracy. First, MobileNetV3 reconstructs the YOLO backbone feature extraction network to reduce the computational redundancy. Second, conventional convolutions are replaced with depthwise separable convolutions to decouple spatial and channel feature learning. Third, the original C3 module (a cross-stage partial bottleneck structure with three convolutional layers) is reconfigured using redesigned GhostBottleneck blocks (lightweight residual blocks that stack Ghost Modules—where the first expands channels via depthwise separable convolution and the second compresses channels while retaining shortcut connections). In addition, the introduction of a novel dual-stream attention mechanism enhances prediction accuracy while maintaining detection performance. A paired-sample t-test was used to evaluate the effectiveness of the proposed algorithm. The results demonstrate that the P-values for all three key performance metrics —precision, recall, and mAP@0.5 — are significantly lower than 0.05, confirming that MDE-YOLO achieves statistically significant improvements over YOLOv5s in these metrics. Compared with YOLOv5, the proposed method reduces the number of parameters, computational complexity, and model weight by 69.9%, 77.5%, and 67.8%, respectively, while maintaining an average recognition rate reduction of only 0.5%. The slight decrease in accuracy can be considered an acceptable trade-off, providing a methodological framework for overcoming the “accuracy-resource” dilemma in traffic perception systems.
Get full access to this article
View all access options for this article.
