Building Context-Related Dialogue Systems Based on Chinese-Script-Dialogue Corpus

Pingshuo Ma,Chunwen Li
DOI: https://doi.org/10.23919/chicc.2019.8865554
2019-01-01
Abstract:High-quality corpora are of great importance for training dialogue generation models. We have developed a Chinese open-domain dialogue corpus which is collected from movie and television scripts, named the Chinese-Script-Dialogue. There are a total amount of 888,967 dialogues in the corpus and each dialogue consists of 4.58 turns on average. We asked human annotators to evaluate the quality of the corpus. Annotation results show that most of the dialogues are qualified. Based on this corpus, we designed a series of multi-turn dialogue systems, named Context-Related Dialogue Systems based on Transformer (CDST). Experiment results show that the CDSTs tend to generate more semantically related replies than the simple Transformer, i.e., to achieve higher BLEU score.
What problem does this paper attempt to address?