Knowledge Distillation with Attention for Deep Transfer Learning of Convolutional Networks
Xingjian Li,Haoyi Xiong,Zeyu Chen,Jun Huan,Ji Liu,Cheng-Zhong Xu,Dejing Dou,Xingjian Li,Haoyi Xiong,Zeyu Chen,Jun Huan,Ji Liu,Cheng-Zhong Xu,Dejing Dou
DOI: https://doi.org/10.1145/3473912
IF: 4.157
2022-06-30
ACM Transactions on Knowledge Discovery from Data
Abstract:Transfer learning through fine-tuning a pre-trained neural network with an extremely large dataset, such as ImageNet, can significantly improve and accelerate training while the accuracy is frequently bottlenecked by the limited dataset size of the new target task. To solve the problem, some regularization methods, constraining the outer layer weights of the target network using the starting point as references (SPAR), have been studied. In this article, we propose a novel regularized transfer learning framework \operatorname{DELTA} , namely DE ep L earning T ransfer using Feature Map with A ttention . Instead of constraining the weights of neural network, \operatorname{DELTA} aims at preserving the outer layer outputs of the source network. Specifically, in addition to minimizing the empirical loss, \operatorname{DELTA} aligns the outer layer outputs of two networks, through constraining a subset of feature maps that are precisely selected by attention that has been learned in a supervised learning manner. We evaluate \operatorname{DELTA} with the state-of-the-art algorithms, including L^2 and \emph {L}^2\text{-}SP . The experiment results show that our method outperforms these baselines with higher accuracy for new tasks. Code has been made publicly available. 1
computer science, information systems, software engineering