Many-objective evolutionary self-knowledge distillation with adaptive branch fusion method

Bai Jiayuan,Zhang Yi
DOI: https://doi.org/10.1016/j.ins.2024.120586
IF: 8.1
2024-04-10
Information Sciences
Abstract:As a new technology for model compression, self-knowledge distillation (SKD) avoids the large computational overhead of training the teacher model seen with traditional knowledge distillation. However, existing SKD methods pay attention to the knowledge transfer between deep and shallow layers of the network but ignore the mutual learning between shallow branches. This paper proposes a many-objective evolutionary self-knowledge distillation framework (MaOESKD) to guide the knowledge fusion between branches in the SKD neural network. This framework embeds an optimization module and a temporary branch into the multi-branch SKD network. The optimization module includes a many-objective adaptive weight optimization model (MaAWOM) and an many evolutionary optimization algorithm based on multi-strategy consensus mechanism (MaOEA-MCM); meanwhile, the temporary branch performs linear weighted fusion. In the MOBWOM, the weight of different branches in knowledge fusion is taken as the decision variable, and the mutual information, covariance, KL divergence between branch output features, and the total information of each branch are taken as the optimization objective. The MSCMEA integrates several state-of-the-art individual selection strategies in the field of evolutionary algorithms. It includes shift density estimation (SDE), penalized boundary intersection (PBI), balanced fitness estimation (BFE), and adaptive position transformation (APT). The accuracy of MOESKD achieves 99.70, 95.74 and 78.21 in MNIST, CIFAR-10 and CIFAR-100.
computer science, information systems
What problem does this paper attempt to address?