Pretreatment for Speech Machine Translation

Xiaofei Zhang,Chong Feng,Heyan Huang
DOI: https://doi.org/10.1007/978-3-642-16732-4_13
2010-01-01
Abstract:There are many meaningless modal particles and dittographes in natural spoken language, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. And thus the translation would be rather poor if the ASR results are directly translated by MT (machine translation). Therefore, it is necessary to transform the abnormal ASR results into normative texts to fit machine translation. In this paper, a pretreatment approach which based on conditional random field model was introduced to delete the meaningless modal particles and dittographes, correct the recognition errors, and punctuated the ASR results before machine translation. Experiments show that the MT BLEU of 0.2497 is obtained, that improved by 18.4% over the MT baseline without pretreatment.
What problem does this paper attempt to address?