An Improved Bayesian Text Categorization System

LIU Hua
DOI: https://doi.org/10.3969/j.issn.1000-9965.2007.01.012
2007-01-01
Abstract:The weighted factor of conditional probability in Nave-Bayes was ameliorated,the new factor is product of word's kinds-difference and frequency,which emphasizes words with high word's kinds-difference,incarnates frequency's positivity,on the contrary,reduces the affect of common words.In corpus with 3 ten thousand documents,15 kinds and 244 sub-kinds, the experiment verified this means: MicroF1 increase of 18.9 percent of parent-category,MicroF1 increase of 7.6 percent of sub-category.
What problem does this paper attempt to address?