Long-tailed Multi-label Text Classification Via Label Co-occurrence-Aware Knowledge Transfer

Kai Li,Liping Jing
DOI: https://doi.org/10.1109/ecnlpir57021.2022.00024
2022-01-01
Abstract:Multi-label classification is an extension of traditional multi-class classification. Unlike multi-class classification, where only one label can be allocated to an instance, multi-label classification will use multiple labels to describe an instance in more detail. However, in real-world applications, training samples typically exhibit a long-tailed class distribution, where a small portion of classes have massive sample points but the others are associated with only a few samples. To address the challenge of insufficient training data on tail label classification, we propose Label Co-Occurrence-Aware Knowledge Transfer (LCOAKT) to use label co-occurrence information to transfer knowledge learned in head classes to the equivalent semantically comparable tail classes. Extensive experiments show that significantly improves performance on rare classes while maintaining outstanding head class performance and outperforming stateof-the-art methods.
What problem does this paper attempt to address?