Abstract:The number of categories of instances in the real world is normally huge, and each instance may contain multiple labels. To distinguish these massive labels utilizing machine learning, eXtreme Label Classification (XLC) has been established. However, as the number of categories increases, the number of parameters and nonlinear operations in the classifier also rises. This results in a Classifier Computational Overload Problem (CCOP). To address this, we propose a Multi-Head Encoding (MHE) mechanism, which replaces the vanilla classifier with a multi-head classifier. During the training process, MHE decomposes extreme labels into the product of multiple short local labels, with each head trained on these local labels. During testing, the predicted labels can be directly calculated from the local predictions of each head. This reduces the computational load geometrically. Then, according to the characteristics of different XLC tasks, e.g., single-label, multi-label, and model pretraining tasks, three MHE-based implementations, i.e., Multi-Head Product, Multi-Head Cascade, and Multi-Head Sampling, are proposed to more effectively cope with CCOP. Moreover, we theoretically demonstrate that MHE can achieve performance approximately equivalent to that of the vanilla classifier by generalizing the low-rank approximation problem from Frobenius-norm to Cross-Entropy. Experimental results show that the proposed methods achieve state-of-the-art performance while significantly streamlining the training and inference processes of XLC tasks. The source code has been made public at <a class="link-external link-https" href="https://github.com/Anoise/MHE" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the **Classifier Computational Overload Problem (CCOP)** in the Extreme Label Classification (XLC) task. As the number of classes increases, the number of parameters and non - linear operations in the classifier also increase, resulting in an excessive computational burden. To solve this problem, the author proposes a Multi - Head Encoding (MHE) mechanism. ### Specific Problem Description In the real world, the number of classes of instances is usually huge, and each instance may contain multiple labels. To distinguish these large numbers of labels, researchers proposed Extreme Label Classification (XLC). However, as the number of classes increases, the number of parameters and non - linear operations in the classifier also increases correspondingly, leading to the Classifier Computational Overload Problem (CCOP). This makes existing machine - learning methods (such as One - Hot encoding or multi - label learning algorithms) difficult to meet the computational and storage requirements in practical applications. ### Solution To solve CCOP, the author proposes a Multi - Head Encoding (MHE) mechanism, replacing the traditional single - head classifier with a multi - head classifier. Specifically: 1. **During the training process**: MHE decomposes extreme labels into multiple short local labels, and each head is trained on these local labels. 2. **During the testing process**: The predicted labels can be directly calculated from the local prediction results of each head, thus geometrically reducing the computational load. In addition, according to the different characteristics of XLC tasks (such as single - label, multi - label, and model pre - training tasks), the author proposes three implementation methods based on MHE: - **Multi - Head Product (MHP)**: Used for single - label classification tasks. - **Multi - Head Cascade (MHC)**: Used for multi - label classification tasks. - **Multi - Head Sampling (MHS)**: Used for model pre - training tasks. ### Theoretical Analysis The author also theoretically proves that MHE can make the output of the multi - head classifier close to the output of the traditional classifier under the Cross - Entropy (CE) metric through the low - rank approximation method. The experimental results show that the proposed method achieves state - of - the - art performance while significantly simplifying the training and inference processes. ### Main Contributions 1. Proposed a Multi - Head Encoding (MHE) mechanism to solve CCOP in XLC tasks, significantly reducing the computational complexity and simplifying the training and inference processes. 2. Designed three algorithms based on MHE, which are respectively applicable to different XLC tasks, and demonstrated in experiments that these algorithms achieve state - of - the - art performance. 3. Generalized the low - rank approximation problem from the Frobenius norm to Cross - Entropy (CE), theoretically analyzed the representational ability of MHE, and proved that the performance gap between it and the traditional classifier is small and does not require label pre - processing techniques. Through these methods, the paper effectively solves the computational overload problem in extreme label classification and provides new solutions and theoretical support for related fields.

Multi-Head Encoding for Extreme Label Classification

Multiple-kernel-learning-based Extreme Learning Machine for Classification Design

A Multi-Class Large Margin Classifier

A High Speed Multi-label Classifier based on Extreme Learning Machines

HAXMLNet: Hierarchical Attention Network for Extreme Multi-Label Text Classification

Deep Learning for Extreme Multi-label Text Classification

Label-aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification

UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification

Multi-Dimensional Classification via Decomposed Label Encoding

Two-Stage Label Embedding Via Neural Factorization Machine for Multi-Label Classification

XRR: Extreme Multi-label Text Classification with Candidate Retrieving and Deep Ranking

Label Disentanglement in Partition-based Extreme Multilabel Classification

Scalable Label Distribution Learning for Multi-Label Classification

A Label Embedding Method via Conditional Covariance Maximization for Multi-label Classification.

AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification

Optimizing Extreme Learning Machine Via Generalized Hebbian Learning and Intrinsic Plasticity Learning

Dual-Encoders for Extreme Multi-Label Classification

Multi-Label Classification Method Based on Extreme Learning Machines

Prototypical Extreme Multi-label Classification with a Dynamic Margin Loss

A Survey on Extreme Multi-label Learning

DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification