Highly Accurate Distributed Classification Of Web Documents

Jingkuan Song,Hui Gao,Lianli Gao,Yan Fu
2009-01-01
Abstract:With the rapid growth of internet, it is a scientific challenge and a massive economic need to discover an efficient and accurate text classifier for handling tons of online documents. This paper presents a distributed model for efficient web document classifications. In the model, the distributed text classifiers are trained serially with the weights on the training instances, which are adaptively set according to their previous performances. Based on the distributed model, Unequal Bagging (UBagging), an improved technique of bagging for text classifier is also proposed. Results from the experiments show that our approach could gain higher classification accuracy over traditional centralized text classifiers, and require less memory and computational time.
What problem does this paper attempt to address?