Abstract:Abstract Background The critical step in analyzing gene expression data is to divide genes into co-expression modules using module detection methods. Clustering algorithms are the most commonly employed technique for gene module detection. To obtain gene modules with great biological significance, the choice of an appropriate similarity measure methodology is vital. However, commonly used similarity measurement may not fully capture the complexities of biological systems. Hence, exploring more informative similarity measures before partitioning gene co-expression modules remains important. Results In this paper, we proposed a Dual-Index Nearest Neighbor Similarity Measure (DINNSM) algorithm to address the above issue. The algorithm first calculates the similarity matrix between genes using Pearson correlation or Spearman correlation. Then, nearest neighbor measurements are constructed based on the similarity matrix. Finally, the similarity matrix is reconstructed. We tested the six similarity measurement methods (Pearson correlation, Spearman correlation, Euclidean distance, maximum information coefficient, distance correlation, and DINNSM) by using four clustering algorithms: K-means, Hierarchical, FCM, and WGCNA on three independent gene expression datasets. The cluster evaluation was based on four indices: the Silhouette index, Calinski-Harabaz index, Adjust-Biological homogeneity index, and Davies-Bouldin index. The results showed that DINNSM is accurate and can get biologically meaningful gene co-expression modules. Conclusions DINNSM is better at revealing the complex biological relationships between genes and helps to obtain more accurate and biologically meaningful gene co-expression modules.

Studies on the Clustering Algorithm for Analyzing Gene Expression Data with a Bidirectional Penalty.

A Penalized Regression-Based Biclustering Approach in Gene Expression Data

Penalty term based suitable fuzzy intuitionistic possibilistic clustering: analyzing high dimensional gene expression cancer database

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Study on Dynamic Clustering Analysis Method for Gene Expression Data Based on Multidimension Pseudo F-statistics

Clustering gene expression data based on predicted differential effects of GV interaction.

A Kernel-Based Clustering Method for Gene Selection with Gene Expression Data.

Gen-Cluster: an Efficient Gene Expression Data High Dimensional Clustering Algorithm

Genetic Algorithms Applied to Multi-Class Clustering for Gene Expression Data

An Analysis of Gene Expression Data using Penalized Fuzzy C-Means Approach

An Improved Biclustering Method For Analyzing Gene Expression Profiles

Clustering Algorithm Based on Dual-Index Nearest Neighbor Similarity Measure and Its Application in Gene Expression Data Analysis

Application of New Algorithm in Gene Expression Profile Clustering

A close-to optimum bi-clustering algorithm for microarray gene expression data

A Parallel Algorithm for Gene Expressing Data Biclustering

A Double K-Mean Clustering Algorithm for Sequential Gene Data Based on the Hidden Markov Model

Gamma-based clustering via ordered means with application to gene-expression analysis

A novel biclustering algorithm and its application in gene expression profiles

Subspace Weighting Co-Clustering of Gene Expression Data

A novel clustering method for analysis of gene microarray expression data

Effective Clustering Algorithms for Gene Expression Data