High-Performance Swahili Keyword Search with Very Limited Language Pack: the Thuee System for the Openkws15 Evaluation

Meng Cai,Zhiqiang Lv,Cheng Lu,Jian Kang,Like Hui,Zhuo Zhang,Jia Liu
DOI: https://doi.org/10.1109/asru.2015.7404797
2015-01-01
Abstract:This paper presents the Swahili keyword search system developed by the THUEE team for the OpenKWS15 evaluation, which is conducted by NIST under the IARPA Babel program. There are several highlights in the development of the system, including automatic generation of the pronunciation lexicon, aggressive data augmentation, the multilingual bottleneck feature extractor trained from 6 languages, text selection from web data for language model training, semi-supervised training for acoustic models and language models, out-of-vocabulary keyword detection using morphemes and a rich diversity of the systems for combination. A wide variety of acoustic modeling techniques are explored and compared. Up to 12 different individual systems are used for combination. The system achieves the state-of-the-art performance in the required condition of the evaluation.
What problem does this paper attempt to address?