Abstract
In complex industrial environments such as power plants, traditional methods for monitoring safety compliance often face significant limitations due to complex environmental constraints, such as severe occlusion, extreme lighting variations, and visual clutter. To enhance the practicality and robustness of safety supervision systems in real-world conditions, this paper proposes a structurally optimized detection model named DMD-DETR, based on the RT-DETR architecture, for identifying whether power plant workers are wearing personal protective equipment (PPE) correctly. Specifically, we integrate a Diverse Branch Block (DBB) to fortify feature extraction against low-contrast backgrounds, introduce a Modulation Fusion Module (MFM) to suppress industrial noise and accentuate small PPE targets, and employ a lightweight DySample operator to preserve fine-grained spatial details during upsampling. Extensive experiments were conducted on both a self-constructed dataset from representative complex scenarios and a public benchmark dataset. The results demonstrate that on the self-constructed dataset, the improved model achieves a mAP@0.5:0.95 of 62.94%, which is 1.53% higher than the baseline RT-DETR, while tests on the public dataset further confirm the model’s robust generalization capability. Meanwhile, the model’s GFLOPs decreased to 54.5 and the number of parameters reduced to 19.39 million, all while maintaining excellent real-time performance. These findings confirm that the proposed method significantly improves detection accuracy under complex conditions and provides a reliable technical foundation for safety supervision in power plants and similar industrial settings.
Get full access to this article
View all access options for this article.
