Controlling Text Complexity in Neural Machine Translation

Sweta Agrawal,Marine Carpuat
DOI: https://doi.org/10.48550/arXiv.1911.00835
2019-11-03
Abstract:This work introduces a machine translation task where the output is aimed at audiences of different levels of target language proficiency. We collect a high quality dataset of news articles available in English and Spanish, written for diverse grade levels and propose a method to align segments across comparable bilingual articles. The resulting dataset makes it possible to train multi-task sequence-to-sequence models that translate Spanish into English targeted at an easier reading grade level than the original Spanish. We show that these multi-task models outperform pipeline approaches that translate and simplify text independently.
Computation and Language
What problem does this paper attempt to address?