A Fusion Approach to Spoken Language Identification Based on Combining Multiple Phone Recognizers and Speech Attribute Detectors

Yannan Wang,Jun Du,Lirong Dai,Chin-Hui Lee
DOI: https://doi.org/10.1109/iscslp.2014.6936714
2014-01-01
Abstract:We propose a fusion approach to spoken language recognition by combining multiple tokenizers with phone and speech attribute models trained on a collection of multilingual corpora with different front-end features. The speech attribute models are trained with bottleneck features extracted from deep neural networks while the phone models are trained with temporal patterns neural network features. By exploiting different combinations of front-end features, fundamental speech units and tokenization models, we demonstrate that speech attribute units are complementary to phone units and produce enhanced performances when they are combined with conventional phone based tokenizers. Tested on the National Institute of Standards and Technology 2009 language recognition evaluation task, leveraged upon diversity in system combination, we find that speech attribute recognition followed by language modeling achieves an additional average relative equal error rate reduction of more than 20% when fused with the state-of-the-art systems with phone recognition followed by language modeling.
What problem does this paper attempt to address?