Length-weighted string kernels for sequence data classification
Shengfeng Tian,Shaomin Mu,Chuanhuan Yin
DOI: https://doi.org/10.1016/j.patrec.2007.04.008
IF: 4.757
2007-01-01
Pattern Recognition Letters
Abstract:Various sequence-similarity kernels, the string kernels, have been introduced for use with support vector machines (SVMs) in a discriminative approach to the sequence data classification problems. In these applications, string kernels are asked to be similarity measures between strings. In this paper, we present a new string kernel and its variants suitable to sequence data classification, which are determined by (possibly non-contiguous) matching subsequences with all possible lengths shared by two strings. In these kernels, gaps in subsequences are allowed and the longer subsequences contribute more to the value of kernels. Efficient algorithms of computing the kernels are derived with the techniques of dynamic programming and bit-parallelism. In some cases, the computation of the kernel is linear in the length of the strings.
What problem does this paper attempt to address?