Improved Feature Selection Approach Combined with Semantic

Zhong-yang XIONG,Ling-ling FU,Yu-fang ZHANG,Jian JIANG
DOI: https://doi.org/10.3724/sp.j.1087.2010.02621
2010-01-01
Abstract:The traditional selection methods for text categorization are based on the statistical information of word frequency,which ignores the semantic effect of the words and cannot take more useful features because of the redundancy.A table named conception-domain was built based on the semantic dictionary HowNet,which included the word itself and its domain value.If a word from the text was existent in the table,it would be replaced by its domain value with more general meaning.By this way,more semantic information was added to the selected features and the redundancy between features of items could be eliminated to some extent.The experiments were carried out by improved information gain and χ2 respectively.And the results show that this method has effectively improved the precision of the text categorization.
What problem does this paper attempt to address?