Research on Algorithm of Text Feature Selection and Weighting Based on Category

LV Zheng-yu,LIN Yong-min,ZHAO Shuang,CHEN Jing-nian,ZHU Wei-dong
DOI: https://doi.org/10.3778/j.issn.1002-8331.2008.20.044
2008-01-01
Computer Engineering and Applications Journal
Abstract:The aim of feature selection and weighting in automatic text categorization is to reduce the dimension of feature space,remove noise features and improve classification precision.The features selected by traditional feature selection methods always bias common category,and the commonly used weighting method TF*IDF only considers the relationship between features and documents and ignores the relationship between features and categories.According to the above problem,this paper presents a text feature selection and weighting method based on category.Experiments on skewed category distribution corpus of two different languages show that the method can improve categorization precision effectively,and comparing with traditional method,the feature space dimension is also reduced to a certain degree.
What problem does this paper attempt to address?