Algorithm for Finding Optimal Gene Sets in Microarray Prediction

J.M. Deutsch
DOI: https://doi.org/10.48550/arXiv.physics/0108011
2001-08-08
Abstract:Motivation: Microarray data has been recently been shown to be efficacious in distinguishing closely related cell types that often appear in the diagnosis of cancer. It is useful to determine the minimum number of genes needed to do such a diagnosis both for clinical use and to determine the importance of specific genes for cancer. Here a replication algorithm is used for this purpose. It evolves an ensemble of predictors, all using different combinations of genes to generate a set of optimal predictors. Results: We apply this method to the leukemia data of the Whitehead/MIT group that attempts to differentially diagnose two kinds of leukemia, and also to data of Khan et. al. to distinguish four different kinds of childhood cancers. In the latter case we were able to reduce the number of genes needed from 96 down to 15, while at the same time being able to perfectly classify all of their test data. Availability: <a class="link-external link-http" href="http://stravinsky.ucsc.edu/josh/gesses/" rel="external noopener nofollow">this http URL</a> Contact: josh@physics.<a class="link-external link-http" href="http://ucsc.edu" rel="external noopener nofollow">this http URL</a>
Biological Physics,Computational Physics,Medical Physics,Quantitative Biology
What problem does this paper attempt to address?