Beyond Word for Word: Fact Guided Training for Neural Data-to-Document Generation

Feng Nie,Hailin Chen,Jinpeng Wang,Rong Pan,Chin-Yew Lin
DOI: https://doi.org/10.1007/978-3-030-32233-5_41
2019-01-01
Abstract:Recent end-to-end encoder-decoder neural models for data-to-text generation can produce fluent and seemingly informative texts despite these models disregard the traditional content selection and surface realization architecture. However, texts generated by such neural models are often missing important facts and contradict the input data, particularly in generation of long texts. To address these issues, we propose a Fact Guided Training (FGT) model to improve both content selection and surface realization by leveraging an information extraction (IE) system. The IE system extracts facts mentioned in reference data and generates texts which provide fact-guided signals. First, a content selection loss is designed to penalize content deviation between generated texts and their references. Moreover, with the selection of proper content for generation, a consistency verification mechanism is designed to inspect fact discrepancy between generated texts and their corresponding input data. The consistency signal is non-differentiable and is optimized via reinforcement learning. Experimental results on a recent challenging dataset ROTOWIRE show our proposed model outperforms neural encoder-decoder models in both automatic and human evaluations.
What problem does this paper attempt to address?