Abstract:Many real-world text classification tasks often deal with a large number of closely related categories organized in a hierarchical structure or taxonomy. Hierarchical multi-label text classification (HMTC) has become rather challenging when it requires handling large sets of closely related categories. The structural features of all categories in the entire hierarchy and the word semantics of their category labels are very helpful in improving text classification accuracy over large sets of closely related categories, which has been neglected in most of existing HMTC approaches. In this paper, we present a hybrid embedding-based text representation for HMTC with high accuracy. First, the hybrid embedding consists of both graph embedding of categories in the hierarchy and their word embedding of category labels. The Structural Deep Network Embedding-based graph embedding model is used to simultaneously encode the global and local structural features of a given category in the whole hierarchy for making the category structurally discriminable. We further use the word embedding technique to encode the word semantics of each category label in the hierarchy for making different categories semantically discriminable. Second, we presented a level-by-level HMTC approach based on the bidirectional Gated Recurrent Unit network model together with the hybrid embedding that is used to learn the representation of the text level-by-level. Last but not least, extensive experiments were made over five large-scale real-world datasets in comparison with the state-of-the-art hierarchical and flat multi-label text classification approaches, and the experimental results show that our approach is very competitive to the state-of-the-art approaches in classification accuracy, in particular maintaining computational costs while achieving superior performance.

A Hierarchical Fine-Tuning Approach Based on Joint Embedding of Words and Parent Categories for Hierarchical Multi-label Text Classification

HFT-ONLSTM: Hierarchical and Fine-Tuning Multi-label Text Classification

Joint Embedding of Words and Category Labels for Hierarchical Multi-label Text Classification

Hybrid embedding-based text representation for hierarchical multi-label text classification

Hierarchical and Bidirectional Joint Multi-Task Classifiers for Natural Language Understanding

HE-HMTC: A hybrid embedding-based text representation for Hierarchical multi-label text classification

Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach

Hierarchical Multi-label Text Classification with Horizontal and Vertical Category Correlations

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Hierarchical Multi-label Text Classification: Self-adaption Semantic Awareness Network Integrating Text Topic and Label Level Information

Hierarchical Multilabel Ship Classification in Remote Sensing Images Using Label Relation Graphs

Hierarchical Multilabel Text Classification Via Multitask Learning.

LA-HCN: Label-based Attention for Hierarchical Multi-label TextClassification Neural Network

An Interactive Fusion Model for Hierarchical Multi-label Text Classification

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

HiTIN: Hierarchy-aware Tree Isomorphism Network for Hierarchical Text Classification

Label-Correction Capsule Network for Hierarchical Text Classification.

Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification

Hierarchy-Aware and Label Balanced Model for Hierarchical Text Classification

HPT: Hierarchy-aware Prompt Tuning for Hierarchical Text Classification

Weakly-Supervised Hierarchical Text Classification