Large-Scale Hierarchical Text Classification Based On Path Semantic Information

Feng Gao,Chengrong Wu,Naiwang Guo,Danfeng Zhao
DOI: https://doi.org/10.1109/BIFE.2009.60
2009-01-01
Abstract:Although an improvement of hierarchical text classification can be achieved by using hierarchical structure information, existing hierarchical text classification methods suffer from a problem, namely error propagation (especially in large-scale deep hierarchy). In this paper, we define the concept of path-based semantic vector for the presentation of categories based on which prior information provided by training set can be employed in a classifier-independent way to reduce and further eliminate classification errors. In particular, we first propose the occurrence probability based strategy for hierarchical text classification which can help limit errors rate efficiently. Cooccurrence probability is then introduced to correct the classification errors occurred on higher levels of the hierarchy. Extensive experiments show that our hierarchical classification strategies perform well on ODP dataset, even on deep levels of the hierarchy.
What problem does this paper attempt to address?