Sage Journals: Discover world-class research

Abstract

Knowledge distillation has shown its potential in the field of computer vision, significantly reducing model parameters and memory usage with minimal impact on model performance. Although current mainstream methods face issues related to inconsistencies between distillation targets and real targets, as well as insufficient learning of teacher features. To address these issues, this paper presents a novel knowledge distillation framework, termed ADKD, designed to mitigate the limitations of current methodologies. It consists of two modules: AGD (Adaptive Generative Distillation) and RHD (Reused Head Distillation). Through AGD, the teacher guides the learning of student network, enabling student to achieve stronger representational power. Meanwhile, RHD effectively addresses the discrepancies between real targets and distillation targets. By masking and reusing feature maps and utilizing the teacher detection head, a more effective object detection model is obtained. The proposed approach is straightforward to implement and effective, having undergone extensive experimentation on datasets PASCAL VOC and MS COCO to demonstrate its efficacy. The results indicate that our method outperforms other knowledge distillation techniques.

Keywords

Knowledge distillation object detection feature imitation logit mimicking model compression

Get full access to this article

View all access options for this article.

References

Wang

Yuan

Zhang

, et al. Distilling object detectors with fine-grained feature imitation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Vol. 30, 2019, pp.4933–4942. (Curran Associates, Inc., 2017).

, et al. Adaptive search-and-training for robust and efficient network pruning. IEEE Trans Pattern Anal Mach Intell 2023; 45: 9325–9338.

Young

Zhe

Taubman

, et al. Transform quantization for cnn compression. IEEE Trans Pattern Anal Mach Intell 2021; 44: 5700–5714.

Swaminathan

Garg

Kannan

, et al. Sparse low rank factorization for deep neural network compression. Neurocomputing 2020; 398: 185–196.

Hinton

Vinyals

Dean

. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.

, et al. When object detection meets knowledge distillation: a survey. IEEE Trans Pattern Anal Mach Intell 2023; 45: 10555–10579.

Romero

, et al. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.

Zheng

, et al. Localization distillation for dense object detection. 2022, pp.9407–9416.

Cao

, et al. PKD: general distillation framework for object detectors via pearson correlation coefficient. Adv Neural Inf Process Syst 2022; 35: 15394–15406.

10.

Yang

, et al. Masked generative distillation. In: European conference on computer Vision, 2022, pp.53–69. Springer.

11.

Wang

, et al. CROSSKD: cross-head knowledge distillation for object detection. 2024, pp.16520–16530.

12.

Everingham

Van Gool

Williams

, et al. The pascal visual object classes (voc) challenge. Int J Comput Vis 2010; 88: 303–338.

13.

Lin

T-Y

, et al. Microsoft COCO: common objects in context. 2014, pp.740–755. Springer.

14.

, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv Neural Inf Process Syst 2020; 33: 21002–21012.

15.

Lin

T-Y

Goyal

Girshick

, et al. Focal loss for dense object detection. 2017, pp.2980–2988.

16.

Zhang

Chi

Yao

, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. 2020, pp.9759–9768.

17.

Girshick

Donahue

Darrell

, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. 2014, pp.580–587.

18.

Gkioxari

Dollár

, et al. Mask r-cnn. 2017, pp.2961–2969.

19.

Ren

Girshick

, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 2016; 39: 1137–1149.

20.

Kong

, et al. Foveabox: beyound anchor-based object detection. IEEE Trans Image Process 2020; 29: 7389–7398.

21.

Zhu

Savvides

. Feature selective anchor-free module for single-shot object detection. 2019, pp.840–849.

22.

Redmon

Divvala

Girshick

, et al. You only look once: Unified, real-time object detection. 2016, pp.779–788.

23.

Lyu

, et al. Rtmdet: An empirical study of designing real-time object detectors. arXiv preprint arXiv:2212.07784, 2022.

24.

Bochkovskiy

Wang

C-Y

Liao

H-YM

. Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.

25.

Liu

, et al. SSD: single shot multibox detector. 2016, pp.21–37. Springer.

26.

Carion

, et al. End-to-end object detection with transformers. 2020, pp.213–229. Springer.

27.

Liu

, et al. Dab-detr: dynamic anchor boxes are better queries for detr. arXiv preprint arXiv:2201.12329, 2022.

28.

Meng

, et al. Conditional detr for fast training convergence. 2021, pp.3651–3660.

29.

BuciluÇŽ

Caruana

Niculescu-Mizil

. Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp.535–541.

30.

Chen

Choi

, et al. Learning efficient object detection models with knowledge distillation. Adv Neural Inf Process Syst 2017; 30.

31.

Dai

, et al. General instance distillation for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp.7842–7851.

32.

Jin

Yan

. Mimicking very efficient network for object detection. In: Proceedings of the ieee conference on computer vision and pattern recognition, 2017, pp.6356–6364.

33.

Guo

, et al. Distilling object detectors via decoupled features. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp.2154–2164.

34.

, et al. Knowledge distillation for object detection via rank mimicking and prediction-guided feature imitation. In: Proceedings of the AAAI conference on artificial intelligence, Vol. 36, 2022, pp.1306–1313.

35.

Zhixing

, et al. Distilling object detectors with feature richness. Adv Neural Inf Process Syst 2021; 34: 5213–5224.

36.

Yang

Martinez

Bulat

, et al. Knowledge distillation via softmax regression representation learning (International Conference on Learning Representations (ICLR), 2021).

37.

Yang

, et al. Focal and global knowledge distillation for detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp.4643–4652.

38.

De Rijk

Schneider

Cordts

, et al. Structural knowledge distillation for object detection. Adv Neural Inf Process Syst 2022; 35: 3858–3870.

39.

Chen

, et al. Mmdetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.

40.

Paszke

, et al. Automatic differentiation in pytorch. 2017.

41.

Zhang

Ren

, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2016, pp.770–778.

42.

Selvaraju

, et al. Grad-Cam: visual explanations from deep networks via gradient-based localization, 2017.

Adaptive generative knowledge distillation for dense object detection with reused detector

Abstract

Keywords

Get full access to this article

References