Global Gene Expression Analysis Using Machine Learning Methods
Min Xu
DOI: https://doi.org/10.48550/arXiv.1506.02087
2015-06-06
Abstract:Microarray is a technology to quantitatively monitor the expression of large number of genes in parallel. It has become one of the main tools for global gene expression analysis in molecular biology research in recent years. The large amount of expression data generated by this technology makes the study of certain complex biological problems possible and machine learning methods are playing a crucial role in the analysis process. At present, many machine learning methods have been or have the potential to be applied to major areas of gene expression analysis. These areas include clustering, classification, dynamic modeling and reverse engineering.
In this thesis, we focus our work on using machine learning methods to solve the classification problems arising from microarray data. We first identify the major types of the classification problems; then apply several machine learning methods to solve the problems and perform systematic tests on real and artificial datasets. We propose improvement to existing methods. Specifically, we develop a multivariate and a hybrid feature selection method to obtain high classification performance for high dimension classification problems. Using the hybrid feature selection method, we are able to identify small sets of features that give predictive accuracy that is as good as that from other methods which require many more features.
Quantitative Methods,Computational Engineering, Finance, and Science,Machine Learning