Tree‐based Disease Classification Using Protein Data

HT Zhu,CY Yu,HP Zhang
DOI: https://doi.org/10.1002/pmic.200300520
2003-01-01
PROTEOMICS
Abstract:A reliable and precise classification of diseases is essential for successful diagnosis and treatment. Using mass spectrometry from clinical specimens, scientists may find the protein variations among disease and use this information to improve diagnosis. In this paper, we propose a novel procedure to classify disease status based on the protein data from mass spectrometry. Our new tree-based algorithm consists of three steps: projection, selection and classification tree. The projection step aims to project all observations from specimens into the same bases so that the projected data have fixed coordinates. Thus, for each specimen, we obtain a large vector of coefficients' on the same basis. The purpose of the selection step is data reduction by condensing the large vector from the projection step into a much lower order of informative vector. Finally, using these reduced vectors, we apply recursive partitioning to construct an informative classification tree. This method has been successfully applied to protein data, provided by the Department of Radiology and Chemistry at Duke University.
What problem does this paper attempt to address?