Hybrid Data Clustering Based on Dependency Structure and Gibbs Sampling

Shuang-Cheng Wang,Xiao-Lin Li,Hai-Yan Tang
DOI: https://doi.org/10.1007/11941439_138
2006-01-01
Abstract:A new method for data clustering is presented in this paper. It can cluster data set with both continuous and discrete data effectively. By using this method, the values of cluster variable are viewed as missing data. At first, the missing data are initialized randomly. All those data are revised through the iteration by combining Gibbs sampling with the dependency structure that is built according to prior knowledge or built as star-shaped structure alternatively. A penalty coefficient is introduced to extend the MDL scoring function and the optimal cluster number is determined by using the extended MDL scoring function and the statistical methods.
What problem does this paper attempt to address?