Heterogeneous Information Integration in Hierarchical Text Classification

Huai-Yuan Yang,Tie-Yan Liu,Li Gao,Wei-Ying Ma
DOI: https://doi.org/10.1007/11731139_29
2006-01-01
Abstract:Previous work has shown that considering the category distance in the taxonomy tree can improve the performance of text classifiers. In this paper, we propose a new approach to further integrate more categorical information in the text corpus using the principle of multi-objective programming (MOP). That is, we not only consider the distance between categories defined by the branching of the taxonomy tree, but also consider the similarity between categories defined by the document/term distributions in the feature space. Consequently, we get a refined category distance by using MOP to leverage these two kinds of information. Experiments on both synthetic and real-world datasets demonstrated the effectiveness of the proposed algorithm in hierarchical text classification.
What problem does this paper attempt to address?