Abstract
Knowledge distillation has shown its potential in the field of computer vision, significantly reducing model parameters and memory usage with minimal impact on model performance. Although current mainstream methods face issues related to inconsistencies between distillation targets and real targets, as well as insufficient learning of teacher features. To address these issues, this paper presents a novel knowledge distillation framework, termed ADKD, designed to mitigate the limitations of current methodologies. It consists of two modules: AGD (Adaptive Generative Distillation) and RHD (Reused Head Distillation). Through AGD, the teacher guides the learning of student network, enabling student to achieve stronger representational power. Meanwhile, RHD effectively addresses the discrepancies between real targets and distillation targets. By masking and reusing feature maps and utilizing the teacher detection head, a more effective object detection model is obtained. The proposed approach is straightforward to implement and effective, having undergone extensive experimentation on datasets PASCAL VOC and MS COCO to demonstrate its efficacy. The results indicate that our method outperforms other knowledge distillation techniques.
Get full access to this article
View all access options for this article.
