Flatten hierarchies for large-scale hierarchical text categorization

Xiaolin Wang,Bao-Liang Lu
DOI: https://doi.org/10.1109/ICDIM.2010.5664247
2010-01-01
Abstract:Hierarchies are very popular in organizing documents and web pages, hence automated hierarchical classification techniques are desired. However, the current dominant hierarchical approach of top-down method suffers accuracy decrease compared with flat classification approaches, because of error propagation and bottom nodes' data sparsity. In this paper we flatten hierarchies to relieve such accuracy decrease in top-down method, which aims to make hierarchies both effective enough to make large-scale classification tasks feasible, and simple enough to ensure high classification accuracy. We propose two flattening strategies based on these two causes of the accuracy decrease, and experimental results show that the flattening strategy designed for error propagation is more effective, which suggests that hierarchies with lots of branches at top layers can provide high classification accuracy. Besides, we analyze the computational complexity before and after flattening, which approximately agree with the experimental results.
What problem does this paper attempt to address?