Context Based Feature Description Model in Chinese Text Categorization

HE Zhong-Shi,LIU Li
DOI: https://doi.org/10.3969/j.issn.1002-137X.2007.05.049
2007-01-01
Computer Science
Abstract:Text feature description is considered as the basic problem in text classification and it aims to use computable feature to model documents.The most used feature description method treats a text as a set of words,which called “bag of words” model,under this model feature selection and weighting consider the “frequency” of single word only,ignoring the relation of words in context.But generally words in a certain context field can deliver correlative meaning for a same topic.So the “bag of words” model loses the context information that is important facts for improving classification precision.This paper presents a new feature description method based on text context.First,a commonly used feature selection method is used to get an initial set of feature words;secondly,Mutual Information(MI) is used to compute the word dependence in a concrete context,then,the feature words is selected according to the denpendence.Meanwhile,the weight of each feature is adjusted.Experiment result indicates the efficience of the new approach.
What problem does this paper attempt to address?