Speech Recognition System Combination for Machine Translation

M. J. F. Galest,X. Liu,R. Sinha,P. C. Woodland,K. Yu,S. Matsoukas,T. Ng,K. Nguyen,L. Nguyen,J-L Gauvain,L. Lamel,A. Messaoudi
DOI: https://doi.org/10.1109/icassp.2007.367310
2007-01-01
Abstract:The majority of state-of-the-art speech recognition systems make use of system combination. The combination approaches adopted have traditionally been tuned to minimising word error rates (WERs). In recent years there has been a growing interest in taking the output from speech recognition systems in one language and translating it into another. This paper investigates the use of cross-site combination approaches in terms of both WER and impact on translation performance. In addition, the stages involved in modifying the output from a speech-to-text (STT) system to be suitable for translation are described. Two source languages, Mandarin and Arabic, are recognised and then translated using a phrase-based statistical machine translation system into English. Performance of individual systems and cross-site combination using cross-adaptation and ROVER are given. Results show that the best STT combination scheme in terms of WER is not necessarily the most appropriate when translating speech.
What problem does this paper attempt to address?