Improved text classification methods based on weighted adjustments

鲁明羽,李凡,庞淑英,陆玉昌,周立柱
DOI: https://doi.org/10.3321/j.issn:1000-0054.2003.04.021
2003-01-01
Abstract:Text classification is the key to text mining which is used Mensively in traditional information searches, web information queries and web mining. A text classification method was developed using a weighted adjustment measure to improve the vector space model (VSM) and the Naive Bayesian classifier (NBC). The EM algorithm was then used for non-tutor Bayesian learning and a Chinese/English text classification system was developed. Three sets of test results show that the weighted adjustment measure using scoring functions can improve the precision of text classification models such as VSM and NBC with the effect increasing with increasing size of the training text set. The maximum NBC precision is 86%.
What problem does this paper attempt to address?