Application Of The Character-Level Statistical Method In Text Categorization

Zhen Yang,XiangFei Nie,Weiran Xu,Jun Guo
DOI: https://doi.org/10.1109/ICCIAS.2006.295293
2006-01-01
Abstract:It is generally thought that semantic and grammatical information was very significant to better understanding and processing of text. But in simple text categorization task, absence of this information does not always lead to the degradation of classifier performance. In this paper, we discuss the application of the character-level statistical method in text categorization, which extract character-level frequent pattern rather than consider the semantic and grammatical information. Compared with traditional n-gram model, the presented method is easy and convenient. Then by casting character-level statistical method in Bayesian theory framework, the proposed method was applied to spam detection. At last, we discuss the multiclass problem in short message categorization based on combination strategies. Effectiveness of the models and feasibility of the present method are verified.
What problem does this paper attempt to address?