Combining pre-editing and post-editing to improve SMT of user-generated content

Victoria Porro,P. Bouillon,Johanna Gerlach,Sabine Lehmann
Abstract:The poor quality of user-generated content (UGC) found in forums hinders both readability and machine-translatability. To improve these two aspects, we have developed human- and machine-oriented pre-editing rules, which correct or reformulate this content. In this paper we pre-sent the results of a study which investigates whether pre-editing rules that improve the quality of statistical machine translation (SMT) output also have a positive impact on post-editing productivity. For this study, pre-editing rules were applied to a set of French sentences extracted from a technical forum. After SMT, the post-editing temporal effort and final quality are compared for translations of the raw source and its pre-edited version. Results obtained suggest that pre-editing speeds up post-editing and that the combination of the two processes is worthy of further investigation.
Engineering,Computer Science
What problem does this paper attempt to address?