CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation

Yinpei Dai,Wanwei He,Bowen Li,Yuchuan Wu,Zheng Cao,Zhongqi An,Jian Sun,Yongbin Li
DOI: https://doi.org/10.48550/arXiv.2211.11617
2022-11-22
Abstract:Practical dialog systems need to deal with various knowledge sources, noisy user expressions, and the shortage of annotated data. To better solve the above problems, we propose CGoDial, new challenging and comprehensive Chinese benchmark for multi-domain Goal-oriented Dialog evaluation. It contains 96,763 dialog sessions and 574,949 dialog turns totally, covering three datasets with different knowledge sources: 1) a slot-based dialog (SBD) dataset with table-formed knowledge, 2) a flow-based dialog (FBD) dataset with tree-formed knowledge, and a retrieval-based dialog (RBD) dataset with candidate-formed knowledge. To bridge the gap between academic benchmarks and spoken dialog scenarios, we either collect data from real conversations or add spoken features to existing datasets via crowd-sourcing. The proposed experimental settings include the combinations of training with either the entire training set or a few-shot training set, and testing with either the standard test set or a hard test subset, which can assess model capabilities in terms of general prediction, fast adaptability and reliable robustness.
Computation and Language
What problem does this paper attempt to address?
The problems that this paper attempts to solve are the challenges faced by Chinese goal - oriented dialogue systems in practical applications, specifically including: 1. **Diverse knowledge sources**: Goal - oriented dialogue systems need to handle various types of knowledge sources, such as Slot - Based Dialogue (SBD), Flow - Based Dialogue (FBD) and Retrieval - Based Dialogue (RBD). Each type of dialogue system has its own specific form of knowledge representation, such as tabular form, tree - like structure and candidate set form. 2. **Noisy user expressions**: In actual conversations, users' expressions often contain noise, such as colloquial expressions, Automatic Speech Recognition (ASR) errors, etc., which will all affect the performance of the dialogue system. 3. **Insufficient labeled data**: High - quality labeled data is crucial for training and evaluating dialogue systems, but the existing Chinese dialogue datasets are deficient in this regard. To better solve the above problems, the paper proposes a new benchmark dataset named CGoDial, which contains 96,763 dialogue sessions and 574,949 dialogue turns, covering three datasets with different knowledge sources: - **Slot - Based Dialogue (SBD)**: It contains tabular - form knowledge and is used to search for and provide entities that meet users' needs. - **Flow - Based Dialogue (FBD)**: It contains tree - like - structure knowledge and is used to guide users to complete specific tasks. - **Retrieval - Based Dialogue (RBD)**: It contains candidate - set - form knowledge and is used to select the correct response from the candidate set. In addition, in order to bridge the gap between academic benchmarks and actual dialogue scenarios, the paper collects data through real - life conversations or adds colloquial features to existing datasets through crowdsourcing. The paper also designs a variety of experimental settings, including training with the full training set or a small - sample training set, and testing with the standard test set or a difficult test subset, to evaluate the model's capabilities in general prediction, rapid adaptability and reliable robustness. In conclusion, this paper aims to promote the research and development of Chinese dialogue systems by constructing a comprehensive and challenging Chinese goal - oriented dialogue benchmark dataset.