Chinese Web-Page Categorization Approach to the Bad Text Information

黄旭,朱艳琴,罗喜召
DOI: https://doi.org/10.19304/j.cnki.issn1000-7180.2008.06.057
2008-01-01
Abstract:The characters of bad text information are discussed.Based on Bayesian theory,a new text categorization ap- proach to the bad text information is proposed.The approach improves the rate of text categorization by reducing the pro- cessing of word segmentation and the dimension of character space.Furthermore,it maintains the effect of text catego- rization by optimizing the selection of character item and the way to calculate it's weight.Experimental results show that this approach can maintain the effect and improve the rate of text categorization effectively.
What problem does this paper attempt to address?