Enhanced RT-DETR for real-time transparent tape detection on fabric surfaces with reflection-resistant attention module

Abstract

In textile production, transparent tape is used for fabric fixing and quality marking, and its identification directly affects the accuracy of automated sorting. Because the tape’s color is similar to the fabric, with low contrast and strong reflectivity, traditional visual methods have difficulty achieving reliable recognition. To solve this, this paper introduces a high-precision transparent tape detection method based on the RT-DETR (Real-Time Detection Transformer) network. The efficient cross-stage partial Darknet53 (ECSPDarknet) is adopted as the backbone network, compressing model parameters significantly while enhancing feature extraction capabilities. The reflection-resistant attention module (RRAM) is integrated during the feature fusion stage to strengthen multiscale feature fusion, effectively solving the recognition challenges caused by the similarity between transparent tape and the background, as well as high reflection. The dynamic group shuffle transformer (DGST) replaces the reparameterization convolutional C3 (RepC3), resolving the high computational load and real-time bottlenecks introduced by the latter’s multibranch structure. In addition, the bounding box regression loss function is replaced with a weighted sum of SmoothL1 and FocalEIoU loss functions, optimizing the model’s convergence efficiency and improving detection accuracy. Ablation experiments were conducted with three sets of random seeds. The results show that the improved model reduces parameters by 43.8%, floating-point operations by 39.6%, and increases FPS by 13.2% compared with the baseline. Precision, recall, F1-score, mAP@0.5, and mAP@[0.5:0.95] improve by 2.5%, 4.2%, 3.3%, 2.2%, and 1.0%, respectively. Meanwhile, the algorithm outperforms mainstream methods in terms of detection accuracy, providing a foundation for high-precision, transparent tape identification on fabrics.

Keywords

Transparent tape detection textile inspection SimAM attention RT-DETR reflection-robust detection

Get full access to this article

View all access options for this article.

References

Ingle

Jasper

. A review of the evolution and concepts of deep learning and AI in the textile industry. Text Res J 2025; 95: 1709–1737.

Islam

Zamil

MZH

Rayed

, et al. Deep learning and computer vision techniques for enhanced quality control in manufacturing processes. IEEE Access 2024; 12: 121449–121479.

Koptelov

Thompson

Hallett

, et al. A deep learning approach for predicting the architecture of 3D textile fabrics. Mater Des 2024; 239: 112803.

Yao

Bai

Liao

, et al. From CNN to transformer: A review of medical image segmentation models. J Imag Informat Med 2024; 37: 1529–1547.

Arkin

Yadikar

, et al. A survey: Object detection methods from CNN to transformer. Multimedia Tools Applicat 2023; 82: 21353–21383.

Kahraman

Durmuşoğlu

. Deep learning-based fabric defect detection: A review. Text Res J 2023; 93: 1485–1503.

Krichen

. Convolutional neural networks: A survey. Computers 2023; 12: 151.

Zhao

Shi

Quan

, et al. Fabric defect detection based on transfer learning and improved Faster R-CNN. J Eng Fibers Fabrics 2022; 17: 15589250221086647.

Liu

Huang

Zhao

, et al. Lightweight single shot multi-box detector: A fabric defect detection algorithm incorporating parallel dilated convolution and dual channel attention. Text Res J 2024; 94: 209–224.

10.

Liu

. Improved YOLOv5-based image detection of cotton impurities. Text Res J 2024; 94: 906–917.

11.

Zhang

, et al. Fabric defect detection via a spatial cloze strategy. Text Res J 2023; 93: 1612–1627.

12.

Song

Lang

Zhang

, et al. Textile defect detection algorithm based on the improved YOLOv8. IEEE Access 2025; 13: 11217–11231.

13.

Wang

, et al. Differentiable NMS via sinkhorn matching for end-to-end fabric defect detection. arXiv preprint arXiv:250507040, 2025.

14.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. Adv Neur Informat Process Syst 2017; 30: 5998–6008.

15.

Chen

J-N

Liu

, et al. Transfg: A Transformer architecture for fine-grained recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2022, pp. 852–860.

16.

Carion

Massa

Synnaeve

, et al. End-to-end object detection with Transformers. In: European Conference on Computer Vision, 2020, pp. 213–229.

17.

Yang

Xie

Liu

. Mark points detection and measurement on reflective surfaces for in-situ machining of large and complex components. In: 2024 9th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS), 2024, pp. 246–253.

18.

Yang

Wang

Liu

, et al. Detection of multi-size peach in orchard using RGB-D camera combined with an improved DEtection Transformer model. Intell Data Anal 2023; 27: 1539–1554.

19.

Huang

, et al. Multi-scale feature fusion extraction structure for the leather defect detection algorithm. In: 2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), 2023, pp. 254–259.

20.

Zhang

Liu

, et al. Dino: DETR with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:220303605 2022.

21.

Zhao

, et al. DETRs beat YOLOs on real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 16965–16974.

22.

Karatepe

Nabiyev

. Military camouflage classification with Mask R-CNN algorithm. Communi Fac Sci Univ Ankara Ser A2-A3 Phys Sci Eng 2023; 65: 69–78.

23.

Hamamoto

Hideshima

, et al. Single image reflection removal using DeepLabv3+. In: International Symposium on Artificial Intelligence and Robotics, 2023, pp. 181–188.

24.

Zhang

Konz

Kramer

, et al. Quantifying the limits of segment anything model: Analyzing challenges in segmenting tree-like and low-contrast structures. arXiv e-prints 2024: arXiv: 2412.04243.

25.

Bochkovskiy

Wang

C-Y

Liao

H-YM

. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:200410934 2020.

26.

Noman

Fiaz

Cholakkal

, et al. ELGC-Net: Efficient local–global context aggregation for remote sensing change detection. IEEE Trans Geosci Remote Sens 2024; 62: 1–11.

27.

Yang

Zhang

R-Y

, et al. Simam: A simple, parameter-free attention module for convolutional neural networks. In: International Conference on Machine Learning, 2021, pp. 11863–11874.

28.

Gong

. Lightweight object detection: A study based on YOLOv7 integrated with ShuffleNetv2 and vision transformer. arXiv preprint arXiv:240301736 2024.

29.

Girshick

Fast

R-CNN

. In: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.

30.

Zhang

Y-F

Ren

Zhang

, et al. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022; 506: 146–157.

31.

Rezatofighi

Tsoi

Gwak

, et al. Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 658–666.

32.

Zhang

Ren

, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

33.

Shen

Sun

. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.

34.

Woo

Park

Lee

J-Y

, et al. CBAM: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3–19.

35.

Zhang

Zheng

H-T

, et al. ShuffleNet v2: Practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 116–131.

36.

Zhang

Wang

Dayoub

, et al. Varifocalnet: An IoU-aware dense object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8514–8523.

37.

Yujing

Sheng

, et al. Yarn target detection of a braiding machine based on the YOLO algorithm. Text Res J 2024; 94: 2863–2875.

38.

Picard

. Torch. manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision. arXiv preprint arXiv:210908203 2021.

39.

Selvaraju

Cogswell

Das

, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 618–626.

40.

Liu

Anguelov

Erhan

, et al. SSD: Single shot multibox detector. In: European Conference on Computer Vision, 2016, pp. 21–37.

41.

Glenn

. Ultralytics YOLOv5, https://github.com/ultralytics/yolov5 (2020).

42.

Glenn

Ayush

Jing

. Ultralytics YOLOv8, https://github.com/ultralytics/ultralytics (2023).