Abstract:Background: Gene microarray technology is an effective tool to investigate the simultaneous activity of multiple cellular pathways from hundreds to thousands of genes. However, because data in the colossal amounts generated by DNA microarray technology are usually complex, noisy, high-dimensional, and often hindered by low statistical power, their exploitation is difficult. To overcome these problems, two kinds of unsupervised analysis methods for microarray data: principal component analysis (PCA) and independent component analysis (ICA) have been developed to accomplish the task. PCA projects the data into a new space spanned by the principal components that are mutually orthonormal to each other. The constraint of mutual orthogonality and second-order statistics technique within PCA algorithms, however, may not be applied to the biological systems studied. Extracting and characterizing the most informative features of the biological signals, however, require higher-order statistics.Results: ICA is one of the unsupervised algorithms that can extract higher-order statistical structures from data and has been applied to DNA microarray gene expression data analysis. We performed FastICA method on DNA microarray gene expression data from Alzheimer's disease (AD) hippocampal tissue samples and consequential gene clustering. Experimental results showed that the ICA method can improve the clustering results of AD samples and identify significant genes. More than 50 significant genes with high expression levels in severe AD were extracted, representing immunity-related protein, metal-related protein, membrane protein, lipoprotein, neuropeptide, cytoskeleton protein, cellular binding protein, and ribosomal protein. Within the aforementioned categories, our method also found 37 significant genes with low expression levels. Moreover, it is worth noting that some oncogenes and phosphorylation- related proteins are expressed in low levels. In comparison to the PCA and support vector machine recursive feature elimination (SVM-RFE) methods, which are widely used in microarray data analysis, ICA can identify more AD-related genes. Furthermore, we have validated and identified many genes that are associated with AD pathogenesis.Conclusion: We demonstrated that ICA exploits higher-order statistics to identify gene expression profiles as linear combinations of elementary expression patterns that lead to the construction of potential AD-related pathogenic pathways. Our computing results also validated that the ICA model outperformed PCA and the SVM-RFE method. This report shows that ICA as a microarray data analysis tool can help us to elucidate the molecular taxonomy of AD and other multifactorial and polygenic complex diseases.

Assessing the Applicability of PCA in Clustering Analysis of Gene Expression Data

Clustering gene expression data based on predicted differential effects of GV interaction.

Application of New Clustering Algorithms in Gene Expression Data

Genetic Algorithms Applied to Multi-Class Clustering for Gene Expression Data

Power Analysis of Principal Components Regression in Genetic Association Studies.

Using Matrix of Thresholding Partial Correlation Coefficients to Infer Regulatory Network

Limitations of Clustering with PCA and Correlated Noise

K-means clustering via principal component analysis

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Gene Selection Algorithm Based on Correlation Analysis

Probabilistic PCA of Censored Data: Accounting for Uncertainties in the Visualization of High-Throughput Single-Cell Qpcr Data.

Independent Component Analysis of Alzheimer's DNA Microarray Gene Expression Data

An Analysis of Gene Expression Data using Penalized Fuzzy C-Means Approach

Quantitative trait associated microarray gene expression data analysis

Distortion-free PCA on sample space for highly variable gene detection from single-cell RNA-seq data

Interpolation based consensus clustering for gene expression time series

Robust principal component analysis for accurate outlier sample detection in RNA-Seq data

A novel clustering analysis based on PC A and SOMs for gene expression patterns

Effective Clustering Algorithms for Gene Expression Data

Analyze Alzheimer's DNA Microarray Gene Expression Data by Using Pattern Recognition Methods

Discriminant analysis to evaluate clustering of gene expression data