GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

Zhijing Jin,Qipeng Guo,Xipeng Qiu,Zheng Zhang
DOI: https://doi.org/10.18653/v1/2020.coling-main.217
2020-01-01
Abstract:Data collection for the knowledge graph-to-text generation is expensive.As a result, research on unsupervised models has emerged as an active field recently.However, most unsupervised models have to use non-parallel versions of existing small supervised datasets, which largely constrain their potential.In this paper, we propose a large-scale, general-domain dataset, GenWiki.Our unsupervised dataset has 1.3M text and graph examples, respectively.With a human-annotated test set, we provide this new benchmark dataset for future research on unsupervised text generation from knowledge graphs.
What problem does this paper attempt to address?