One‐stage self‐distillation guided knowledge transfer for long‐tailed visual recognition
Yuelong Xia,Shu Zhang,Jun Wang,Wei Zou,Juxiang Zhou,Bin Wen
DOI: https://doi.org/10.1002/int.23068
IF: 8.993
2022-09-11
International Journal of Intelligent Systems
Abstract:Deep learning has achieved remarkable progress for visual recognition on balanced data sets but still performs poorly on real‐world long‐tailed data distribution. The existing methods mainly decouple the problem into the two‐stage decoupling training, that is, representation learning and classifier training, or multistage training based on knowledge distillation, thus resulting in huge training steps and extra computation cost. In this paper, we propose a conceptually simple yet effective One‐stage Long‐tailed Self‐Distillation framework, called OLSD, which simultaneously takes representation learning and classifier training into one‐stage training. For representation learning, we take two different sampling distributions and mixup them to input them into two branches, where the collaborative consistency loss is introduced to train network consistency, and we theoretically show that the proposed mixup naturally generates a tail‐majority distribution mixup. For classifier training, we introduce balanced self‐distillation guided knowledge transfer to improve generalization performance, where we theoretically show that proposed knowledge transfer implicitly minimizes not only cross‐entropy but also KL divergence between head‐to‐tail and tail‐to‐head. Extensive experiments on long‐tailed CIFAR10/100, ImageNet‐LT and multilabel long‐tailed VOC‐LT demonstrate the proposed method's effectiveness.
computer science, artificial intelligence