Knowledge Distillation on Multiple Experts for Long-Tailed Recognition

Daqing Ai,Yanyun Qu
DOI: https://doi.org/10.1109/cac59555.2023.10449969
2023-01-01
Abstract:Deep learning-based image recognition encounters a great challenge since the real-world data always exhibits long-tailed distribution, that is, a few categories contain most of the examples while the rest categories have very few ones. Imbalanced distribution makes it difficult for models to achieve expected recognition performance. The works proposed to tackle this problem main focus on two strategies: re-balancing and ensemble learning. In this paper, we propose a new ensemble method: Multi-Expert Distillation for Long-tailed Recognition (MED), which consists of two training phases. Specifically, we first design a Multi-Expert Network containing multiple experts, each of which is trained in a particular domain. In this case, the whole task is split into several sub-tasks. On top of that, we perform knowledge distillation training to transfer knowledge from Multi-Expert Network to a unified student network to ensemble the experts so that the student can recognize all the categories. Besides, we construct a weight module to control the impact of each expert on the student to maximize the effectiveness of knowledge transfer. We conduct extensive experiments on three benchmarks and the results show that our proposed MED achieves compelling performance with notable improvements.
What problem does this paper attempt to address?