Feature Selection Based On Mutual Information For Language Recognition

Yan Deng,Jia Liu
DOI: https://doi.org/10.1109/CISP.2009.5303829
2009-01-01
Abstract:The prevailing system for language recognition is the parallel phoneme recognition followed by vector space modeling (PPRVSM), which uses a vector space model to describe the co-occurrence information of phones. As the super-vectors are composed of phonetic N-Grams, so for high dimension vectors, there is a problem that the number of N-Grams grows exponentially as the order N increases, which will result in data sparseness. In this paper, we propose a feature selection algorithm to solve this problem, which uses the maximum relevance criteria based on mutual information to select the most discriminative N-Grams to identify languages. The effectiveness of the technique is demonstrated on the NIST 2005 language recognition 30-second task. And we achieve 4.81% in terms of equal-error-rate (EER).
What problem does this paper attempt to address?