Term Selection and Weighting Approach Based on Key Words in Text Categorization

LIU Li,HE Zhong-shi
DOI: https://doi.org/10.3969/j.issn.1000-7024.2006.06.008
2006-01-01
Abstract:Text representation is considered as the mainly problem in text categorization,which is widely used in the vector space model.Term weight in each dimension is its TFIDF value(term frequency,inverse document frequency).But TFIDF is not able to stress the significance of key terms which contribute mainly to the content of a text.A novel term selection and weighting approach based on key words is presented.The structure information and mutual information to extract key words are employed,and word location,word de-pendence,wordfrequency,and document frequency in weighting a term are integrated.In SVM classification experiment,the approach outperforms traditional TFIDF approach with a boost in average precision about 5 %.
What problem does this paper attempt to address?