A New Approach to Email Classification Using Concept Vector Space Model

Chao Zeng,Zhao Lu,Junzhong Gu
DOI: https://doi.org/10.1109/fgcns.2008.7
2008-01-01
Abstract:Email classification methods based on the content general use vector space model. The model is constructed based on the frequency of every independent word appearing in Email content. Frequency based VSM does not take the context environment of the word into account, thus the feature vectors can not accurately represent Email content, which will result in the inaccurate of classification. This paper presents a new approach to Email classification based on the concept vector space model using WordNet. In our approach, based on WordNet we extract the high-level information on categories during training process by replacing terms in the feature vector with synonymy sets and considering the hypernymy-hyponymy relation between synonymy sets. We design a Email classification system based on the concept VSM and carry on a series of experiments. The results show that our approach could improve the accuracy of Email classification especially when the size of training set is small.
What problem does this paper attempt to address?