Towards High Performance LVCSR in Speech-to-Speech Translation System on Smart Phones.

Jian Xue,Xiaodong Cui,Gregg Daggett,Etienne Marcheret,Bowen Zhou
DOI: https://doi.org/10.21437/interspeech.2011-716
2011-01-01
Abstract:This paper presents the endeavors to improve the performance of large vocabulary continuous speech recognition (LVCSR) in speech-to-speech translation system on smart phones. A variety of techniques towards high LVCSR performance are investigated to achieve high accuracy and low latency given constrained resources. This includes one-pass streaming mode decoding for minimum latency, acoustic modeling with full-covariance based on bootstrap and model restructuring for improving recognition accuracy with limited training data; quantized discriminative feature space transforms and quantized Gaussian mixture model to reduce memory usage with negligible degradation on recognition accuracy. Some speed optimization methods are also discussed to increase the recognition speed. The proposed techniques evaluated on the DARPA Transtac datasets will be shown to give good overall performance under the constraints of both CPU and memory on smart phones.
What problem does this paper attempt to address?