Prototypical Extreme Multi-label Classification with a Dynamic Margin Loss

Kunal Dahiya,Diego Ortego,David Jiménez
2024-10-27
Abstract:Extreme Multi-label Classification (XMC) methods predict relevant labels for a given query in an extremely large label space. Recent works in XMC address this problem using deep encoders that project text descriptions to an embedding space suitable for recovering the closest labels. However, learning deep models can be computationally expensive in large output spaces, resulting in a trade-off between high performing brute-force approaches and efficient solutions. In this paper, we propose PRIME, a XMC method that employs a novel prototypical contrastive learning technique to reconcile efficiency and performance surpassing brute-force approaches. We frame XMC as a data-to-prototype prediction task where label prototypes aggregate information from related queries. More precisely, we use a shallow transformer encoder that we coin as Label Prototype Network, which enriches label representations by aggregating text-based embeddings, label centroids and learnable free vectors. We jointly train a deep encoder and the Label Prototype Network using an adaptive triplet loss objective that better adapts to the high granularity and ambiguity of extreme label spaces. PRIME achieves state-of-the-art results in several public benchmarks of different sizes and domains, while keeping the model efficient.
Machine Learning,Information Retrieval
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the trade - off between performance and efficiency in **Extreme Multi - label Classification (XMC)**. Specifically, XMC methods need to predict the most relevant subset of labels for a given query in an extremely large label space. In recent years, research has addressed this issue by projecting text descriptions into an embedding space suitable for recovering nearest - neighbor labels through deep encoders. However, learning deep models is computationally expensive in large - scale output spaces, leading to a trade - off between high - performance brute - force methods and efficient solutions. To this end, the authors propose the PRIME method, which adopts a new prototype - based contrastive learning technique to reconcile efficiency and performance and outperform brute - force methods. The main contributions include: 1. **Proposing an efficient XMC encoder method**: This method is based on label prototypes and aggregates information from multiple queries. 2. **Introducing a novel adaptive triplet loss function**: This function takes into account the high granularity and uncertainty in extremely large label spaces. 3. **Achieving significant performance improvements in multiple benchmark tests**: In particular, it remains efficient in terms of resource consumption. ### Formula Representation To better understand how PRIME works, here are some key formulas: - **Traditional Triplet Loss Function**: \[ L_T=\max(s_{qn} - s_{qp}+m, 0) \] where \(s_{qn}\) and \(s_{qp}\) represent the cosine similarities between the query and the negative and positive labels respectively, and \(m\) is a fixed margin value. - **Triplet Loss Function with Dynamic Margin**: \[ L_T^{cd}=\begin{cases} 0 & \text{if } C_p, \Delta p - n_q\geq\gamma_{\min}\\ \Delta p - n_q+\gamma_{\min} & \text{if } C_p, \Delta p - n_q < \gamma_{\min}\\ \Delta n - p_q+\text{clip}(\Delta s_{n - p}) & \text{if } C_n \end{cases} \] - **Label Prototype Calculation**: \[ z_l = g_\phi(h_l, c_l, v_l) \] where \(h_l\) is the text - based label embedding, \(c_l\) is the label centroid, and \(v_l\) is a learnable free vector. These formulas show how PRIME improves the performance and efficiency of XMC tasks through adaptive margins and label prototypes.