Improving Domain Dictionary-based Text Categorization Using Self-partition Model.

Wenliang Chen,Jingbo Zhu,Muhua Zhu,Li Zhang,Tianshun Yao
DOI: https://doi.org/10.1142/s0219427905001304
2005-01-01
Abstract:In this paper, we present a novel model for improving the performance of Domain Dictionary-based text categorization. The proposed model is named as Self-Partition Model (SPM). SPM can group the candidate words into the predefined clusters, which are generated according to the structure of Domain Dictionary. Using these learned clusters as features, we proposed a novel text representation. The experimental results show that the proposed text representation-based text categorization system performs better than the Domain Dictionary-based text categorization system. It also performs better than the system based on Bag-of-Words when the number of features is small and the training corpus size is small.
What problem does this paper attempt to address?