STRING KERNELS WITH FEATURE SELECTION FOR SVM PROTEIN CLASSIFICATION

Wen-Yun Yang,Bao-Liang Lu
DOI: https://doi.org/10.1142/9781848161092_0004
2008-01-01
Abstract:We introduce a general framework for string kernels. This framework can produce various types of kernels, including a number of existing kernels, to be used with support vector machines (SVMs). In this framework, we can select the informative subsequences to reduce the dimensionality of the feature space. We can model the mutations in biological sequences. Finally, we combine contributions of subsequences in a weighted fashion to get the target kernel. In practical computation, we develop a novel tree structure, coupled with a traversal algorithm to speed up the computation. The experimental results on a benchmark SCOP data set show that the kernels produced by our framework outperform the existing spectrum kernels, in both e‐ciency and ROC50 scores.
What problem does this paper attempt to address?