Information Bottleneck Based Feature Selection in Web Text Categorization

HE Yifan,JIANG Minghu
DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2010.01.027
2010-01-01
Abstract:This paper presents a concept-based feature selection schema for text categorization.The information bottleneck method was used to cluster the Key words based on their distributions on different class labels.Then,concept extraction was used to map the word clusters to DEF items in HowNet as classification features.Tests on an online text corpus show that this approach preserves the robustness of concept-based feature selection methods and overcomes their shortcomings for new words not defined in the concept thesaurus which needs to be maintained and updated.
What problem does this paper attempt to address?