Query-by-example Spoken Term Detection Based on Phonetic Posteriorgram

Beili Song,Wei-Qiang Zhang,Meng Cai,Jia Liu,Michael T. Johnson
DOI: https://doi.org/10.2991/icemct-15.2015.256
2015-01-01
Abstract:Spoken term detection in low-resource situations is a challenging problem, because traditional large vocabulary continuous speech recognition (LVCSR) approaches are often unusable. This paper introduces a method to use deep neural network (DNN) softmax outputs as input features in a query-by-example (QBE) spoken term detection (STD) system. Matches between queries and test utterances are located using a modified dynamic time warping (DTW) search approach. Subsystems are built with unsupervised Gaussian mixture model (GMM) and DNN monophone models trained on Chinese and English languages and evaluated on the SWS 2013 multilingual database of low-resource languages. The score-level fusion of these different subsystems are shown to improve performance significantly over the baseline results.
What problem does this paper attempt to address?