A Tree-Based Approach to the Discovery of Diagnostic Biomarkers for Ovarian Cancer

Jinyan Li,Kotagiri Ramamohanarao
DOI: https://doi.org/10.1007/978-3-540-24775-3_80
2004-01-01
Abstract:Computational diagnosis of cancer is a classification problem, and it has two special requirements on a learning algorithm: perfect accuracy and small number of features used in the classifier. This paper presents our results on an ovarian cancer data set. This data set is described by 15154 features, and consists of 253 samples. Each sample is referred to a woman who suffers from ovarian cancer or who does not have. In fact, the raw data is generated by the so-called mass spectrosmetry technology measuring the intensities of 15154 protein or peptide-features in a blood sample for every woman. The purpose is to identify a small subset of the features that can be used as biomarkers to separate the two classes of samples with high accuracy. Therefore, the identified features can be potentially used in routine clinical diagnosis for replacing labour-intensive and expensive conventional diagnosis methods. Our new tree-based method can achieve the perfect 100% accuracy in 10-fold cross validation on this data set. Meanwhile, this method also directly outputs a small set of biomarkers. Then we explain why support vector machines, naive bayes, and k-nearest neighbour cannot fulfill the purpose. This study is also aimed to elucidate the communication between contemporary cancer research and data mining techniques.
What problem does this paper attempt to address?