Improved VSM Based on Chinese Text Categorization

ZHANG Zhang,FAN Xiao-zhong
DOI: https://doi.org/10.3969/j.issn.1000-7024.2006.21.038
2006-01-01
Abstract:Vector space model is widely used to represent the text in text auto classification.But VSM takes text as a bag of words and ignores the text structure information.The basic VSM method is improved by using different arithmetic to compute the affection of dif-ferent part of the text to classification,the affection of the first sentence and last sentence of paragraphs and title is computed by core word co-occurrence arithmetic,basic VSM method is used to compute other parts' affection.The class by sum of the two parts' affection with different weight is decided.The experimental result show,the precision,recall and F1 value are improved.
What problem does this paper attempt to address?