Abstract:Dialogue datasets are crucial for deep learning-based task-oriented dialogue system research. While numerous English language multi-domain task-oriented dialogue datasets have been developed and contributed to significant advancements in task-oriented dialogue systems, such a dataset does not exist in Japanese, and research in this area is limited compared to that in English. In this study, towards the advancement of research and development of task-oriented dialogue systems in Japanese, we constructed JMultiWOZ, the first Japanese language large-scale multi-domain task-oriented dialogue dataset. Using JMultiWOZ, we evaluated the dialogue state tracking and response generation capabilities of the state-of-the-art methods on the existing major English benchmark dataset MultiWOZ2.2 and the latest large language model (LLM)-based methods. Our evaluation results demonstrated that JMultiWOZ provides a benchmark that is on par with MultiWOZ2.2. In addition, through evaluation experiments of interactive dialogues with the models and human participants, we identified limitations in the task completion capabilities of LLMs in Japanese.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the lack of datasets for Japanese multi - domain task - oriented dialogue systems. Specifically: - **Background and Motivation**: Although several large - scale multi - domain task - oriented dialogue datasets have been developed in English, and these datasets have played an important role in promoting the research and development of task - oriented dialogue systems, such datasets do not yet exist in Japanese, which limits the research and development of Japanese task - oriented dialogue systems. - **Objective**: To promote the research and development of Japanese task - oriented dialogue systems, the authors constructed JMultiWOZ, the first large - scale Japanese multi - domain task - oriented dialogue dataset. JMultiWOZ contains 4,246 dialogues, covering six travel - related domains (tourist attractions, accommodation, restaurants, shopping facilities, taxis, and weather). - **Method**: By using the Wizard of Oz method, each dialogue is carried out by two human participants, one playing the traveler (user) and the other playing the information provider (wizard). In addition, the authors also defined the ontology structure of each domain, constructed the back - end database, and designed user goals to ensure the authenticity and diversity of the dialogue. - **Evaluation**: Using the existing state - of - the - art (SOTA) methods and the latest large - language - model (LLM) methods, the authors evaluated JMultiWOZ on the dialogue state tracking (DST) and response generation (RG) tasks. The evaluation results show that JMultiWOZ can provide a complexity and performance benchmark comparable to that of the main English benchmark dataset MultiWOZ2.2. - **Contributions**: - Constructed JMultiWOZ, the first large - scale Japanese multi - domain task - oriented dialogue dataset. - Evaluated the dataset on DST and RG tasks using existing state - of - the - art models and the latest LLM methods, proving that JMultiWOZ can provide a benchmark with a complexity comparable to that of the main English datasets. - Through interaction experiments with human participants, revealed that even the latest LLM still has challenges in Japanese task - oriented dialogue capabilities. In conclusion, this paper fills an important gap in the research of Japanese multi - domain task - oriented dialogue systems by constructing the JMultiWOZ dataset, providing a valuable resource for future research.

JMultiWOZ: A Large-Scale Japanese Multi-Domain Task-Oriented Dialogue Dataset

MultiWOZ -- A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

MultiWOZ 2.3: A Multi-Domain Task-Oriented Dialogue Dataset Enhanced with Annotation Corrections and Co-Reference Annotation

CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents

HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent

GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems

AllWOZ: Towards Multilingual Task-Oriented Dialog Systems for All

Multi-User MultiWOZ: Task-Oriented Dialogues among Multiple Users

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling

JDDC 2.1: A Multimodal Chinese Dialogue Dataset with Joint Tasks of Query Rewriting, Response Generation, Discourse Parsing, and Summarization

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling

MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation.

The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service

llm-japanese-dataset v0: Construction of Japanese Chat Dataset for Large Language Models and its Methodology

The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset forE-commerce Customer Service

Topic-switch adapted Japanese Dialogue System based on PLATO-2

CMCC: A Comprehensive and Large-Scale Human-Human Dataset for Dialogue Systems

X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset