Relation-Based Multi-Teacher Knowledge Distillation

Yutian Wang,Yi Xu
DOI: https://doi.org/10.1109/ijcnn60899.2024.10650189
2024-01-01
Abstract:Traditional knowledge distillation adopts one teacher model to instruct the training of a lightweight student model. To improve the performance of knowledge distillation, multi-teacher knowledge distillation utilizing multi-party knowledge to guide model learning was proposed. However, existing methods mainly use the summation-result of teacher models of a single sample to integrate multi-party knowledge directly, ignoring the relation knowledge among samples. In this paper, we propose a novel multi-teacher knowledge distillation method, which utilizes the data relation knowledge to allocate weights for teachers adaptively. To obtain the data relation knowledge, we design output-based strategy and feature-based strategy, which would help to allocate more weight for the teacher who has better learned the data relation knowledge. Extensive experiments have demonstrated the performance and efficiency of our proposed multi-teacher knowledge distillation method.
What problem does this paper attempt to address?