A Combination Weighting Algorithm using Relative Entropy for Document Clustering

Bo Ji,Yangdong Ye,Yu Xiao
DOI: https://doi.org/10.1142/S0218001414530024
IF: 1.261
2014-06-11
International Journal of Pattern Recognition and Artificial Intelligence
Abstract:This paper proposes a combination weighting algorithm using relative entropy for document clustering. Combination weighting is widely used in multiple attribute decision making (MADM) problem. However, there exist two difficulties to hinder the applications of combination weighting on document clustering. First, combination weighting is based on the integration of subjective weighting and objective weighting. However, there are so many attributes in documents that the subjective weights which rely on manual annotation by experts are impracticable. Secondly, a document data object might contain hundreds or even thousands of features. It is an extremely time-consuming task to calculate the combination weights. To address the issues, we suggest to simplify the combination weighting by not distinguishing subjective weight and objective weight. Meanwhile, we choose relative entropy method to reduce running time. In our algorithm, we obtain a combination weight set with 14 combination forms. The experiments on real document data show that both on the AC/PR/RE measures and the mutual information (MI) measure, the proposed CWRE-sIB algorithm is superior to the original sequential information bottleneck (sIB) algorithm and a series of weighting-sIB algorithms, which are built by applying a single weighting scheme to the original sIB algorithm.
What problem does this paper attempt to address?