Exploring Hybrid Character-Words Representational Unit in Classical-to-Modern Chinese Machine Translation

Hongyang Zhang,Muyun Yang,Tiejun Zhao
DOI: https://doi.org/10.1109/ialp.2015.7451525
2015-01-01
Abstract:This paper investigates hybrid representational unit in statistical machine translation from Classical to Modern Chinese where the basic unit of Modern Chinese is mixture of Chinese characters and words while characters unit for Classical Chinese. We explore several approaches to hybrid the characters and words in SMT. the best method achieves gains of 0.33 BLEU points or 1.2% relative over the best SMT baseline system which is modeled by different representational granularities. Further more, we find changing distortion limit in SMT has a relatively small effect on enhancing the quality of our hybrid character-words unit system.
What problem does this paper attempt to address?