A Cross-Modal View to Utilize Label Semantics for Enhancing Student Network in Multi-label Classification

Yuzhuo Qin,Hengwei Liu,Xiaodong Gu
DOI: https://doi.org/10.1007/978-3-031-44207-0_2
2023-01-01
Abstract:Knowledge transfer has become a promising approach for improving the performance and efficiency of relatively lightweight networks. Previous research has focused on identifying suitable knowledge and enhancing network structures to obtain more valuable knowledge. However, the introduction of extra information such as semantics remains an unexplored area. In this study, we introduce a multi-label classifier with label embeddings to replace the traditional GAP layer and incorporate semantics. Our approach adopts a cross-modal view for classification and employs the correlation matrix of visual and label modalities as knowledge to enhance the performance of the student. Furthermore, due to the same classification head, we initiate the student’s head with trained teacher’s and enable the label embeddings more representative. Experimental results show that our proposed method outperforms existing typical methods. Additionally, further analysis confirms the effectiveness of our approach.
What problem does this paper attempt to address?