Spoken Language Recognition Based on Gap-Weighted Subsequence Kernels

Wei-Qiang Zhang,Wei-Wei Liu,Zhi-Yi Li,Yong-Zhe Shi,Jia Liu
DOI: https://doi.org/10.1016/j.specom.2014.01.005
IF: 2.723
2014-01-01
Speech Communication
Abstract:Phone recognizers followed by vector space models (PR-VSM) is a state-of-the-art phonotactic method for spoken language recognition. This method resorts to a bag-of-n-grams, with each dimension of the super vector based on the counts of n-gram tokens. The n-gram cannot capture the long-context co-occurrence relations due to the restriction of gram order. Moreover, it is vulnerable to the errors induced by the frontend phone recognizer. In this paper, we introduce a gap-weighted subsequence kernel (GWSK) method to overcome the drawbacks of n-gram. GWSK counts the co-occurrence of the tokens in a non-contiguous way and thus is not only error-tolerant but also capable of revealing the long-context relations. Beyond this, we further propose a truncated GWSK with constraints on context length in order to remove the interference from remote tokens and lower the computational cost, and extend the idea to lattices to take the advantage of multiple hypotheses from the phone recognizer. In addition, we investigate the optimal parameter setting and computational complexity of the proposed methods. Experiments on NIST 2009 LRE evaluation corpus with several configurations show that the proposed GWSK is consistently more effective than the PR-VSM approach. (C) 2014 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?