Research and Implementation of Real-time Text Categorization System

HUANG Xu,ZHU Yan-qin,LUO Xi-zhao
DOI: https://doi.org/10.3969/j.issn.1000-3428.2008.18.031
2008-01-01
Abstract:This paper analyzes the factors which affect the quality of real-time in text categorization, that is the high time-consuming problem ofword segmentation, and the excessively high dimension of character space.Based on the real-time application of Web filter, a real-time textcategorization approach is proposed.The approach improves the rate of text categorization by reducing the processing of word segmentation and thedimension of character space.It maintains the effect of text categorization by optimizing the selection of character item, and implements a real-timetext classifier based on Bayesian theory.Experimental results show that this approach improves the rate of text categorization effectively, and theprecision and recall is maintained at 85 percent and 94 percent.
What problem does this paper attempt to address?