JMultiWOZ: A Large-Scale Japanese Multi-Domain Task-Oriented Dialogue Dataset

Atsumoto Ohashi,Ryu Hirai,Shinya Iizuka,Ryuichiro Higashinaka
2024-03-26
Abstract:Dialogue datasets are crucial for deep learning-based task-oriented dialogue system research. While numerous English language multi-domain task-oriented dialogue datasets have been developed and contributed to significant advancements in task-oriented dialogue systems, such a dataset does not exist in Japanese, and research in this area is limited compared to that in English. In this study, towards the advancement of research and development of task-oriented dialogue systems in Japanese, we constructed JMultiWOZ, the first Japanese language large-scale multi-domain task-oriented dialogue dataset. Using JMultiWOZ, we evaluated the dialogue state tracking and response generation capabilities of the state-of-the-art methods on the existing major English benchmark dataset MultiWOZ2.2 and the latest large language model (LLM)-based methods. Our evaluation results demonstrated that JMultiWOZ provides a benchmark that is on par with MultiWOZ2.2. In addition, through evaluation experiments of interactive dialogues with the models and human participants, we identified limitations in the task completion capabilities of LLMs in Japanese.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the lack of datasets for Japanese multi - domain task - oriented dialogue systems. Specifically: - **Background and Motivation**: Although several large - scale multi - domain task - oriented dialogue datasets have been developed in English, and these datasets have played an important role in promoting the research and development of task - oriented dialogue systems, such datasets do not yet exist in Japanese, which limits the research and development of Japanese task - oriented dialogue systems. - **Objective**: To promote the research and development of Japanese task - oriented dialogue systems, the authors constructed JMultiWOZ, the first large - scale Japanese multi - domain task - oriented dialogue dataset. JMultiWOZ contains 4,246 dialogues, covering six travel - related domains (tourist attractions, accommodation, restaurants, shopping facilities, taxis, and weather). - **Method**: By using the Wizard of Oz method, each dialogue is carried out by two human participants, one playing the traveler (user) and the other playing the information provider (wizard). In addition, the authors also defined the ontology structure of each domain, constructed the back - end database, and designed user goals to ensure the authenticity and diversity of the dialogue. - **Evaluation**: Using the existing state - of - the - art (SOTA) methods and the latest large - language - model (LLM) methods, the authors evaluated JMultiWOZ on the dialogue state tracking (DST) and response generation (RG) tasks. The evaluation results show that JMultiWOZ can provide a complexity and performance benchmark comparable to that of the main English benchmark dataset MultiWOZ2.2. - **Contributions**: - Constructed JMultiWOZ, the first large - scale Japanese multi - domain task - oriented dialogue dataset. - Evaluated the dataset on DST and RG tasks using existing state - of - the - art models and the latest LLM methods, proving that JMultiWOZ can provide a benchmark with a complexity comparable to that of the main English datasets. - Through interaction experiments with human participants, revealed that even the latest LLM still has challenges in Japanese task - oriented dialogue capabilities. In conclusion, this paper fills an important gap in the research of Japanese multi - domain task - oriented dialogue systems by constructing the JMultiWOZ dataset, providing a valuable resource for future research.