Semi-automatic Acquisition Translation Knowledge from Parallel Corpora
Ren, F.,Shigang Li,Hongchi Shi,Kuroiwa, S.
DOI: https://doi.org/10.1109/icsmc.2002.1176052
2002-01-01
Abstract:Translation in most current machine translation systems is based on rules. A crucial problem in such rule-based machine translation is how to acquire the translation knowledge, or called translation rules. Many studies have been conducted for automatic acquisition in the past, but they require a great deal of annotated examples. However, it is difficult to get a large number of annotated examples. In fact, most bilingual texts are not matched in sentences, although matched as the whole text. We describe a semi-automatic process of acquisition of machine translation knowledge from Japanese-Chinese parallel corpora. The process consists of the following three parts. The first part is parallel text alignment, PTA for short. The second part is example analysis and annotation, EAA for short. The third part is construction of the translation rules, CTR for short. The process interacts with a linguist in order to solve problems that the system is unable to solve. This approach is more efficient than the completely automatic approach in the sense that fewer examples are required. We describe the basic idea and the methods of PTA, EAA, and CTR. A prototype system based on the proposed method has been built and some experiments on Japanese-Chinese have been carried out. The results show that once the acquired knowledge is integrated with an MT system, we observe a significant improvement in translation quality.