A Probabilistic Model Based on Uncertainty for Data Clustering.

Yaxin Yu,Xinhua Zhu,Miao Li,Guoren Wang,Dan Luo
DOI: https://doi.org/10.1007/978-3-642-36288-0_12
2013-01-01
Abstract:Recently, all kinds of data in real-life have exploded in an unbelievable way. In order to manage these data, dataspace has been becoming a universal platform, which contains various kinds of data, such as unstructured data, semi-structured data and structured data. But how to cluster these data in dataspace in an efficient and accurate way to help the user manage and explore them is still an intractable problem. In the previous work, the uncertain relationship between term and topic is not considered sufficiently. There are many techniques to handle this problem and probability theory provides an effective way to deal with the uncertainty of clustering. As a result, we proposed a novel probability model based on topic terms, i.e., Probabilistic Term Similarity Model (PTSM) to tackle the uncertainty between term and topic. In this model, not only terms from various data but also structure information of semi-structured and structured data are considered. Each term is assigned a probability indicating how relevant it is to the topic. Then, according to the probability for each term, a probabilistic matrix is established for clustering various data. At last, extensive experiment results show that the clustering method based on this probabilistic model has excellent performance and outperforms some other classical algorithms. © 2013 Springer-Verlag.
What problem does this paper attempt to address?