Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data

Sandrine Dudoit,Jane Fridlyand,Terence P Speed
DOI: https://doi.org/10.1198/016214502753479248
IF: 4.369
2002-03-01
Journal of the American Statistical Association
Abstract:A reliable and precise classification of tumors is essential for successful diagnosis and treatment of cancer. cDNA microarrays and high-density oligonucleotide chips are novel biotechnologies increasingly used in cancer research. By allowing the monitoring of expression levels in cells for thousands of genes simultaneously, microarray experiments may lead to a more complete understanding of the molecular variations among tumors and hence to a finer and more informative classification. The ability to successfully distinguish between tumor classes (already known or yet to be discovered) using gene expression data is an important aspect of this novel approach to cancer classification. This article compares the performance of different discrimination methods for the classification of tumors based on gene expression data. The methods include nearest-neighbor classifiers, linear discriminant analysis, and classification trees. Recent machine learning approaches, such as bagging and boosting, are also considered. The discrimination methods are applied to datasets from three recently published cancer gene expression studies.
statistics & probability
What problem does this paper attempt to address?