13 Chinese English translation database: extracting units of translation from parallel texts

Chang Baobao,Pernilla Danielsson,Wolfgang Teubert
2005-01-01
Abstract:Machine translation has proved to be a very challenging task, much harder than originally imagined in the 1950s. More than 50 years of hard work has failed to change the field significantly. Many of the same problems that initially puzzled researchers are still present today. Most of the Machine Translation (henceforth MT) systems that have become commercially available have adopted transfer-based strategies. These strategies are widely acknowledged to be the most practical approaches. In the transfer-based paradigm, translation is performed in three stages: 1. the source language is analysed into an intermediate source representation, such as syntactic parsed source language; 2. the source representation is then converted into target language dependent representation; 3. finally, the target translation is generated from the target representation.However, a major problem with the transfer approach is its view of the translation unit as it assumes the single word is the unit of translation. Normally, an MT system begins by segmenting the source-language sentence into words, looking up the words in the MT source and transfer lexica. It then converts every source word into a target word. Finally, the MT system stitches all target words together into sentences, according to the rules stated in the syntactical component of the target language. Using single source words as translation units causes several problems. First, it will make an unsuitable base for selecting proper target words since single-source words are usually polysemous. This is a problem that MT shares with all other computational linguistics applications as there currently exists no …
What problem does this paper attempt to address?