CMMQC: Cascaded Multi-Model Quality Control for Unsupervised Data-to-Text Generation

Weitian Zhang,Xu Sun,Yangxing Luo,Wei Gao,Yanchun Zhu
DOI: https://doi.org/10.1109/ijcnn60899.2024.10650876
2024-01-01
Abstract:Data-to-text (D2T) generation, the task of converting structured data into natural language, has extensive real-world applications. While supervised models have achieved promising results, they rely heavily on costly labeled training data. This paper investigates unsupervised D2T generation by leveraging the impressive general abilities of large language models (LLMs). We propose a framework for LLMs to collaboratively learn from unlabeled data through cascaded multi-model quality control. Specifically, one LLM, acting as a writer, generates candidate texts from input data. Additional LLMs, serving as checkers, validate output quality to filter high-quality samples for training the writer LLM. By cascading generation, checking, and meta-checking, the models extract linguistic knowledge and grounding ability from abundant unlabeled data. Experiments on established benchmarks demonstrate enhanced fluency, accuracy, and coherence compared to supervised baselines. This unsupervised approach circumvents labeled data dependence, unlocking readily available LLMs for on-demand D2T generation across diverse applications.
What problem does this paper attempt to address?