Research on Large Scale Hierarchical Classification Based on Candidate Search

Li He,Yan Jia,Zhaoyun Ding,Weihong Han
DOI: https://doi.org/10.1109/WISA.2013.73
2013-01-01
Abstract:Large scale hierarchical classification problem researches how to classify web documents into the categories among a class hierarchy. As the class hierarchy is very large that containing thousands or even tens of thousands of categories, the performance of the classification is still lower. While a reduce-and-conquer strategy has been proposed to make the problem tractable, candidate search is a bottleneck in classification. In this paper, we first analyze the computational complexity of category candidate search problem, and prove that it is an NP-hard problem. Then a candidate search algorithm which adopts a greedy strategy is proposed, and we prove that the proposed greedy strategy is a local optimum choice in the heuristic solving process. In the classification stage, we find that ancestor categories may help classification of candidates. Experiments are conducted on the dataset of web pages from the Chinese Simplified branch of the DMOZ directory. The results show that the proposed algorithm achieves a performance improvement for candidate search compared to existing methods, and further improves the classification accuracy of two-stage approaches.
What problem does this paper attempt to address?