Machine Learning Techniques To Identify Marker Genes For Diagnostic Classification Of Microarrays

E. W. Lang,R. Schachtner,D. Herold,D. Lutter,Ph. Knollmueller,F. Theis,A. M. Tome,P. Gomez Vilda,C. G. Puntonet,J. M. Gorriz-Saez,G. Schmitz,M. Stetter
2010-01-01
Abstract:Intelligent and efficient mathematical and computational tools are needed to analyze and interpret the information content buried in large scale gene expression patterns made available by the recent development of microarray technology [28, 27]. Modern machine learning techniques like Support Vector Machines (SVM) or matrix decomposition techniques, like Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Independent Component Analysis (ICA) and Nonnegative Matrix Factorization (NMF), provide new and efficient analysis tools which are currently explored in this area [76].In this study we focus on classification tasks and apply knowledge-based as well as data-driven approaches to various microarray data sets. The data sets considered comprise the gene expression levels of either human breast cancer (HBC) cell lines or the famous leukemia data set or human peripheral blood cells differentiating from monocytes to macrophages under various environmental conditions. We study gene selection procedures either in gene space or in feature space and show that these tools are able to extract marker genes from these gene expression profiles without the need for extensive data bank search for appropriate functional annotations. With these marker genes corresponding test data sets can then easily be classified into related diagnostic categories.
What problem does this paper attempt to address?