A VECTOR SPACE MODEL BASED DOCUMENT CLASSIFICATION SYSTEM [J]

Huang Xuanjing,Wu Lide
1998-01-01
Pattern Recognition and Artificial Intelligence
Abstract:This paper introduces a document, categorization system based on vector space model. It focuses on vector dimension compression and Chinese proper name recognition. We don't simply choose the whole vocabulary as the terms of the vector space. Otherwise, we make use of statistical information to generate key words as the property terms. Therefore, the vector dimension can be reduced to 14% by this method. In addition, a statistical proper name recognition mechanism is utilised to increase the word segmentation precision. By the integration of above methods, the categorization precision is improved from 59% to 74%.
What problem does this paper attempt to address?