Abstract:In Extreme Multi Label Completion (XMLCo), the objective is to predict the missing labels of a collection of documents. Together with XML Classification, XMLCo is arguably one of the most challenging document classification tasks, as the very high number of labels (at least ten of thousands) is generally very large compared to the number of available labelled documents in the training dataset. Such a task is often accompanied by a taxonomy that encodes the labels organic relationships, and many methods have been proposed to leverage this hierarchy to improve the results of XMLCo algorithms. In this paper, we propose a new approach to this problem, TAMLEC (Taxonomy-Aware Multi-task Learning for Extreme multi-label Completion). TAMLEC divides the problem into several Taxonomy-Aware Tasks, i.e. subsets of labels adapted to the hierarchical paths of the taxonomy, and trains on these tasks using a dynamic Parallel Feature sharing approach, where some parts of the model are shared between tasks while others are task-specific. Then, at inference time, TAMLEC uses the labels available in a document to infer the appropriate tasks and to predict missing labels. To achieve this result, TAMLEC uses a modified transformer architecture that predicts ordered sequences of labels on a Weak-Semilattice structure that is naturally induced by the tasks. This approach yields multiple advantages. First, our experiments on real-world datasets show that TAMLEC outperforms state-of-the-art methods for various XMLCo problems. Second, TAMLEC is by construction particularly suited for few-shots XML tasks, where new tasks or labels are introduced with only few examples, and extensive evaluations highlight its strong performance compared to existing methods.

CRAT-XML: Contrastive Representation Adversarial Training for Extremely Multi-Label Text Classification

AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification

HAXMLNet: Hierarchical Attention Network for Extreme Multi-Label Text Classification

MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification

TLC-XML: Transformer with Label Correlation for Extreme Multi-label Text Classification

LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification.

ADAM: An Attentional Data Augmentation Method for Extreme Multi-label Text Classification

Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

Label-aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification

A Survey on Extreme Multi-label Learning

Deep Learning for Extreme Multi-label Text Classification

Deep Extreme Multi-label Learning.

Deep Extreme Multi-label Learning

SCAT: Robust Self-supervised Contrastive Learning via Adversarial Training for Text Classification

Extreme Multi-label Completion for Semantic Document Labelling with Taxonomy-Aware Parallel Learning

Hierarchical Text Classification (HTC) vs. eXtreme Multilabel Classification (XML): Two Sides of the Same Medal

BGNN-XML: Bilateral Graph Neural Networks for Extreme Multi-Label Text Classification

CoMAL: Contrastive Active Learning for Multi-Label Text Classification

Improving Tail Label Prediction for Extreme Multi-label Learning

XRR: Extreme Multi-label Text Classification with Candidate Retrieving and Deep Ranking