Abstract:BackgroundUnsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue.ResultsAn affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm.ConclusionThe proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.

Clustering of Unevenly Sampled Gene Expression Time-Series Data

A clustering algorithm for distributed time-series data

Interpolation based consensus clustering for gene expression time series

An Analysis of Gene Expression Data using Penalized Fuzzy C-Means Approach

An Effective Biclustering Algorithm for Time-Series Gene Expression Data.

Spectral Preprocessing for Clustering Time-Series Gene Expressions

A New Biclustering Algorithm for Time-Series Gene Expression Data Analysis

Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference.

Modeling and Analysis of Gene Expression Time-Series Based on Co-Expression.

Rough-fuzzy clustering for grouping functionally similar genes from microarray data

Extracting biologically significant patterns from short time series gene expression data

Fuzzy c-Shape: A new algorithm for clustering finite time series waveforms

Effective Clustering Algorithms for Gene Expression Data

Comparison of Clustering Methods for Time Course Genomic Data: Applications to Aging Effects

Efficiently Mining Time-Delayed Gene Expression Patterns

A Dissimilarity Measure Powered Feature Weighted Fuzzy C-Means Algorithm for Gene Expression Data

Study on Dynamic Clustering Analysis Method for Gene Expression Data Based on Multidimension Pseudo F-statistics

Dynamic Time Alignment Kernel-Based Fuzzy Clustering of Non-Equal Length Vector Time Series.

A novel feature measure for fuzzy clustering algorithm on microarray data

Penalty term based suitable fuzzy intuitionistic possibilistic clustering: analyzing high dimensional gene expression cancer database

Clustering of time-course gene expression profiles using normal mixture models with AR(1) random effects