Abstract:Many real-world text classification tasks often deal with a large number of closely related categories organized in a hierarchical structure or taxonomy. Hierarchical multi-label text classification (HMTC) has become rather challenging when it requires handling large sets of closely related categories. The structural features of all categories in the entire hierarchy and the word semantics of their category labels are very helpful in improving text classification accuracy over large sets of closely related categories, which has been neglected in most of existing HMTC approaches. In this paper, we present a hybrid embedding-based text representation for HMTC with high accuracy. First, the hybrid embedding consists of both graph embedding of categories in the hierarchy and their word embedding of category labels. The Structural Deep Network Embedding-based graph embedding model is used to simultaneously encode the global and local structural features of a given category in the whole hierarchy for making the category structurally discriminable. We further use the word embedding technique to encode the word semantics of each category label in the hierarchy for making different categories semantically discriminable. Second, we presented a level-by-level HMTC approach based on the bidirectional Gated Recurrent Unit network model together with the hybrid embedding that is used to learn the representation of the text level-by-level. Last but not least, extensive experiments were made over five large-scale real-world datasets in comparison with the state-of-the-art hierarchical and flat multi-label text classification approaches, and the experimental results show that our approach is very competitive to the state-of-the-art approaches in classification accuracy, in particular maintaining computational costs while achieving superior performance.

Deep Classification in Large-Scale Text Hierarchies

Web Page Classification Based on Heterogeneous Features and a Combination of Multiple Classifiers.

Large-Scale Hierarchical Text Classification Based On Path Semantic Information

Research on Deep Web Classification Based on Domain Feature Text

An experimental study on large-scale web categorization.

Support Vector Machines Classification with a Very Large-Scale Taxonomy

Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning

Large-Scale Hierarchical Text Classification Based on Path Semantic Vector and Prior Information

Review on Hierarchical Learning Methods for Large-Scale Classification Task

Hybrid embedding-based text representation for hierarchical multi-label text classification

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Joint Hierarchical Category Structure Learning and Large-Scale Image Classification

Develop Multi-hierarchy Classification Model: Rough Set Based Feature Decomposition Method

From Web Directories to Ontologies: Natural Language Processing Challenges

A Machine Learning Approach Classification of Deep Web Sources

Novel Design of Decision-Tree-Based Support Vector Machines Multi-Class Classifier

A FAST ALGORITHM FOR LARGE SCALE WEB PAGE CLASSIFICATION

Exploiting Global and Local Hierarchies for Hierarchical Text Classification

Spare CD14 molecules on human monocytes enhance the sensitivity for low LPS concentrations.

A Hierarchical Fine-Tuning Approach Based on Joint Embedding of Words and Parent Categories for Hierarchical Multi-label Text Classification