Joint tokenization, parsing, and translation

Yang Liu
DOI: https://doi.org/10.1109/IUCS.2010.5666651
2010-01-01
Abstract:Summary form only given. Natural language processing is all about ambiguities. In machine translation, tokenization and parsing mistakes due to segmentation and structural ambiguities potentially introduce translation errors. A well-known solution is to provide more alternatives by using compact representations such as lattice and forest. In this talk, I will introduce a technique that goes beyond using lattices and forests, which integrates tokenization, parsing, and translation in one system. Therefore, tokenization, parsing, and translation can interact with and benefit each other in a discriminative framework. Experimental results show that such integration significantly improves tokenization and translation performance.
What problem does this paper attempt to address?