Statistical Natural Language Generation for Speech-to-speech Machine Translation Systems.
Bowen Zhou,Yuqing Gao,Jeffrey S. Sorensen,Zijian Diao,Michael Picheny
DOI: https://doi.org/10.21437/icslp.2002-510
2002-01-01
Abstract:ABSTRACTThispaper presents a statisticalnatural language generation schemefor trainable speech-to-speech machine translation (MT) systemsfor limited domain applications using a cascaded approach. Thenatural language generation scheme in the translation systems isbased on a maximum entropy (ME) statistical model fully trainedfrom a corpus, allowing flexible translation outputs. In this pa-per, the system architecture and some of its components, includingthe parsing, information extraction, and translation etc are brieflyoverviewed, followed by the descriptions of training and search al-gorithms for ME based sentence level NLG within the MT context.Details of NLG including feature selection and robustness are alsoaddressed. We have implemented the described system for trans-lating between Chinese speech and English speech in an air travelapplication domain. Encouraging experimental results have beenobserved and are presented.1. INTRODUCTIONCommerce and travel have created an ever increasing need fortranslation between languages. Recently, progress in the fields ofspeech and language processing have begun to allow the creationof automated systems to accomplish this task. However, the tech-nical challenges of creating a useful speech-to-speech translationdevice pushes against the limitations of current technologies suchas speech recognition, natural language understanding, machinetranslation, natural language generation, and text-to-speech syn-thesis. There have been numerous efforts to create such a devicein recent years.Many technological frameworks have been proposed for thetask of speech translation, ranging from a cascaded approach [1]to finite state transducers [2]. Recently, we presented a speechtranslation system [3] employing a statistical framework appropri-ate for use in language restricted domains. In a cascaded approach,the recognition results obtained in the speaker’s language are an-alyzed and then, through a series of distinct abstract representa-tions, corresponding sentences are generated in the language ofthe listener. Other cascaded speech translation systems have beenproposed in the past few years. However, most of the generationcomponents in such systems are based on fixed templates, which
What problem does this paper attempt to address?