TeaForN: Teacher-Forcing with N-grams

Sebastian Goodman,Nan Ding,Radu Soricut
DOI: https://doi.org/10.48550/arXiv.2010.03494
2020-10-10
Abstract:Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.
Computation and Language
What problem does this paper attempt to address?