XRR: Extreme Multi-label Text Classification with Candidate Retrieving and Deep Ranking

Jie Xiong,Li Yu,Xi Niu,Youfang Leng
DOI: https://doi.org/10.1016/j.ins.2022.11.158
IF: 8.1
2022-12-07
Information Sciences
Abstract:Extreme Multi-label Text Classification (XMTC) is a key task of finding the most relevant labels from a large label set for a document. Although some deep learning-based methods have shown great success in XMTC, they still suffer from the following drawbacks. First, although several methods have improved the precision by clustering labels and combining several sub-models to train and predict for one dataset, they were not ideal in terms of computational efficiency. Second, most of those methods need a low dimensional bottleneck layer before the output layer to compress the feature representations to fit the GPU memory, which results in information loss of original features. In this paper, we proposed a novel two-stage X MTC framework with candidate R etrieving and deep R anking (XRR) to address those drawbacks. In the retrieving stage, we designed two retrieval strategies, including an a ligning P oint M utual I nformation (aPMI) method, and a U nified L abel- S emantic E mbedding (ULSE) method, to extract hundreds of candidates from massive labels. In the ranking stage, we presented a deep ranking model using a pre-trained transformer to distinguish the true labels from candidates. Extensive experiments show that XRR outperforms the state-of-the-art methods on five widely used multi-label datasets.
computer science, information systems
What problem does this paper attempt to address?