Clustering by Heterogeneous Data Fusion : Framework and Applications

Shi Yu,Bart De Moor,Yves Moreau
2008-01-01
Abstract:Clustering is an important problem with many applications, and a number of different algorithms and methods have emerged over the years. The goal of clustering is to group data points into homogeneous groups, where the homogeneity is usually measured by distances or similarities among data points. Recently, may applications face the requirement of clustering by data fusion. This is because that information contained in single data source is limited by its specific observation, therefore, combining multiple observations might facilitate the comprehensive understanding of the problem. For instance, in order to investigate memory persistence (long term or short term memory) of bacteria, a bacterium is observed at different experimental conditions and evolutional times [11]. Then the multiple observations are categorized by clustering algorithms. In scientometrics, a strategy has been proposed to combine text mining data and bibliometrics data (hybrid clustering) to explore the structure mapping of journal sets [8]. In bioinformatics, high throughput techniques produce numerous genomic data. The challenge to endow clustering algorithm with the ability to retrieve correlated or complementary information about the underlying functional partitions of genes and proteins has attracted many interests [2,14]. Unfortunately, though the machine learning community has already focused on data fusion for classification [7] and novelty detection [5], the extension to unsupervised learning such as clustering, is still an unresolved and ongoing problem.
What problem does this paper attempt to address?