Analyze Alzheimer's DNA Microarray Gene Expression Data by Using Pattern Recognition Methods
Wei Kong,Xiaoyang Mou,Bin Yang,Xudong Huang
DOI: https://doi.org/10.1016/j.jalz.2009.04.370
2009-01-01
Abstract:With advances in DNA microarray technology, it is now possible to quantify expression levels of thousands of genes in parallel. However, exploitation of the colossal amount of data generated by microarray technology is difficult because they are usually complex and noisy high-dimensional data and are often hindered by low statistical power. To overcome these problems, principal component analysis (PCA) has been developed to accomplish this task. PCA projects the data into a new space spanned by the principal components which are mutually orthonormal to each other. The constraint of mutual orthogonality and second-order statistics technique they used, however, may not be appropriate for the biological systems studied. Extracting and characterizing the most informative features of the signals, however, require higher-order statistics. Independent component analysis (ICA) methods have received growing attention as effective data-mining tools for microarray gene expression data. As a technique of higher-order statistical analysis, ICA is capable of extracting biologically relevant gene expression features from DNA microarray gene expression data. Combine with other pattern recognition methods such as hierarchical clustering methods, nonnegative matrix factorization and support vector machine, efficient sample classification and gene clustering for Alzheimer's disease were presented. We perform ICA method on hippocampal microarray gene expression data of Alzheimer's disease (AD). Experiments results show that ICA method can improve the classification of AD samples and identify more significant genes. The identified high expression genes in AD are extracted in immunoreactions, metal protein, membrane protein, lipoprotein, neuropeptide, cytoskeleton protein, binding protein and ribosomal protein. And also find many significant low expression genes in above categories, and moreover, some oncogenes and phosphoricproteins are low expressed. Especially, ICA can identify more AD-related genes. We demonstrate that ICA exploits higher-order statistics to identify gene expression profiles as linear combinations of elementary expression patterns that may be interpreted as potential regulation pathways. Experiment results also validate that the ICA model outperforms traditional pattern recognition methods. This report shows that ICA as a microarray data analysis tool can help us to elucidate the molecular taxonomy of AD and other multifactorial and polygenic complex diseases.