Abstract:Extreme Multi-label Classification (XMC) methods predict relevant labels for a given query in an extremely large label space. Recent works in XMC address this problem using deep encoders that project text descriptions to an embedding space suitable for recovering the closest labels. However, learning deep models can be computationally expensive in large output spaces, resulting in a trade-off between high performing brute-force approaches and efficient solutions. In this paper, we propose PRIME, a XMC method that employs a novel prototypical contrastive learning technique to reconcile efficiency and performance surpassing brute-force approaches. We frame XMC as a data-to-prototype prediction task where label prototypes aggregate information from related queries. More precisely, we use a shallow transformer encoder that we coin as Label Prototype Network, which enriches label representations by aggregating text-based embeddings, label centroids and learnable free vectors. We jointly train a deep encoder and the Label Prototype Network using an adaptive triplet loss objective that better adapts to the high granularity and ambiguity of extreme label spaces. PRIME achieves state-of-the-art results in several public benchmarks of different sizes and domains, while keeping the model efficient.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the trade - off between performance and efficiency in **Extreme Multi - label Classification (XMC)**. Specifically, XMC methods need to predict the most relevant subset of labels for a given query in an extremely large label space. In recent years, research has addressed this issue by projecting text descriptions into an embedding space suitable for recovering nearest - neighbor labels through deep encoders. However, learning deep models is computationally expensive in large - scale output spaces, leading to a trade - off between high - performance brute - force methods and efficient solutions. To this end, the authors propose the PRIME method, which adopts a new prototype - based contrastive learning technique to reconcile efficiency and performance and outperform brute - force methods. The main contributions include: 1. **Proposing an efficient XMC encoder method**: This method is based on label prototypes and aggregates information from multiple queries. 2. **Introducing a novel adaptive triplet loss function**: This function takes into account the high granularity and uncertainty in extremely large label spaces. 3. **Achieving significant performance improvements in multiple benchmark tests**: In particular, it remains efficient in terms of resource consumption. ### Formula Representation To better understand how PRIME works, here are some key formulas: - **Traditional Triplet Loss Function**: \[ L_T=\max(s_{qn} - s_{qp}+m, 0) \] where \(s_{qn}\) and \(s_{qp}\) represent the cosine similarities between the query and the negative and positive labels respectively, and \(m\) is a fixed margin value. - **Triplet Loss Function with Dynamic Margin**: \[ L_T^{cd}=\begin{cases} 0 & \text{if } C_p, \Delta p - n_q\geq\gamma_{\min}\\ \Delta p - n_q+\gamma_{\min} & \text{if } C_p, \Delta p - n_q < \gamma_{\min}\\ \Delta n - p_q+\text{clip}(\Delta s_{n - p}) & \text{if } C_n \end{cases} \] - **Label Prototype Calculation**: \[ z_l = g_\phi(h_l, c_l, v_l) \] where \(h_l\) is the text - based label embedding, \(c_l\) is the label centroid, and \(v_l\) is a learnable free vector. These formulas show how PRIME improves the performance and efficiency of XMC tasks through adaptive margins and label prototypes.

Prototypical Extreme Multi-label Classification with a Dynamic Margin Loss

Dual-Encoders for Extreme Multi-Label Classification

UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification

A Multi-Class Large Margin Classifier

Dual Enhancement for Multi-Label Learning with Missing Labels

Bonsai: diverse and shallow trees for extreme multi-label classification

PINA: Leveraging Side Information in eXtreme Multi-label Classification via Predicted Instance Neighborhood Aggregation

Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

Learning label-label correlations in Extreme Multi-label Classification via Label Features

Deep Encoders with Auxiliary Parameters for Extreme Classification.

DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification

In-Context Learning for Extreme Multi-Label Classification

Prototype Matching Networks for Large-Scale Multi-label Genomic Sequence Classification

MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification

Deep Learning for Extreme Multi-label Text Classification

Multi-label Learning with Random Circular Vectors

Learning from Multi-Dimensional Partial Labels.

XRR: Extreme Multi-label Text Classification with Candidate Retrieving and Deep Ranking

Labels in Extremes: How Well Calibrated are Extreme Multi-label Classifiers?

Zero-Shot Learning Over Large Output Spaces : Utilizing Indirect Knowledge Extraction from Large Language Models