Abstract:The adoption of microarray techniques in biological and medical research provides a new way for cancer diagnosis and treatment. In order to perform successful diagnosis and treatment of cancer, discovering and classifying cancer types correctly is essential. Class discovery is one of the most important tasks in cancer classification using biomolecular data. Most of the existing works adopt single clustering algorithms to perform class discovery from biomolecular data. However, single clustering algorithms have limitations, which include a lack of robustness, stability, and accuracy. In this paper, we propose a new cluster ensemble approach called knowledge based cluster ensemble (KCE) which incorporates the prior knowledge of the data sets into the cluster ensemble framework. Specifically, KCE represents the prior knowledge of a data set in the form of pairwise constraints. Then, the spectral clustering algorithm (SC) is adopted to generate a set of clustering solutions. Next, KCE transforms pairwise constraints into confidence factors for these clustering solutions. After that, a consensus matrix is constructed by considering all the clustering solutions and their corresponding confidence factors. The final clustering result is obtained by partitioning the consensus matrix. Comparison with single clustering algorithms and conventional cluster ensemble approaches, knowledge based cluster ensemble approaches are more robust, stable and accurate. The experiments on cancer data sets show that: 1) KCE works well on these data sets; 2) KCE not only outperforms most of the state-of-the-art single clustering algorithms, but also outperforms most of the state-of-the-art cluster ensemble approaches.

Detection of Entity Mixture in Knowledge Bases Using Hierarchical Clustering.

Clustering Analysis-Based Approach to Detecting Entity Mixture in Knowledge Bases.

Reserch of Entity Matching Based on Multiple Heterogenous Data

Detect Missing Attributes for Entities in Knowledge Bases via Hierarchical Clustering

AHAB: Aligning Heterogeneous Knowledge Bases Via Iterative Blocking

Hierarchical Complex Activity Representation and Recognition Using Topic Model and Classifier Level Fusion.

A Knowledge-Based Semisupervised Hierarchical Online Topic Detection Framework.

Chinese Named Entity Recognition and Disambiguation Based on Multi-stage Clustering

Knowledge Based Cluster Ensemble for Cancer Discovery from Biomolecular Data

Web Person Disambiguation Using Hierarchical Co-Reference Model

Detect Incorrect Triples in Knowledge Base Based on Triple Confidence Evaluation.

Discovery and Recognition of Emerging Human Activities Using a Hierarchical Mixture of Directional Statistical Models.

Using Visualization to Improve Clustering Analysis on Heterogeneous Information Network.

Entity Matching Across Heterogeneous Sources

Inferring Hierarchical Mixture Structures: A Bayesian Nonparametric Approach

A Hierarchical Co-Clustering Approach for Entity Exploration over Linked Data

Incorporating World Knowledge to Document Clustering Via Heterogeneous Information Networks

Clustering Technique in Multi-Document Personal Name Disambiguation.

Unsupervised Author Disambiguation Using Dempster–Shafer Theory

A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data

Clustering Change Sign Detection by Fusing Mixture Complexity