GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems

Bosheng Ding,Junjie Hu,Lidong Bing,Sharifah Mahani Aljunied,Shafiq Joty,Luo Si,Chunyan Miao

DOI: https://doi.org/10.48550/arXiv.2110.07679

2022-04-01

Abstract:Much recent progress in task-oriented dialogue (ToD) systems has been driven by available annotation data across multiple domains for training. Over the last few years, there has been a move towards data curation for multilingual ToD systems that are applicable to serve people speaking different languages. However, existing multilingual ToD datasets either have a limited coverage of languages due to the high cost of data curation, or ignore the fact that dialogue entities barely exist in countries speaking these languages. To tackle these limitations, we introduce a novel data curation method that generates GlobalWoZ -- a large-scale multilingual ToD dataset globalized from an English ToD dataset for three unexplored use cases. Our method is based on translating dialogue templates and filling them with local entities in the target-language countries. We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve two main limitations in the dataset construction of multilingual task - oriented dialogue systems (ToD): 1. **Limited language coverage**: Existing multilingual ToD datasets can usually cover only a few languages due to the high cost of data collection. 2. **Ignoring the existence of local entities**: When translating English ToD datasets, existing methods simply translate English named entities (such as place names, restaurant names) into the target language, ignoring the fact that these entities hardly exist in the target - language countries. To solve these problems, the paper proposes a new dataset construction method and generates a large - scale multilingual ToD dataset named GlobalWoZ. This method is achieved by translating dialogue templates and filling in local entities in the target - language countries, thus supporting three unexplored multilingual ToD usage scenarios: - **F&F**: Foreign - language speakers use the ToD system in a foreign - language - speaking country. - **F&E**: Foreign - language speakers use the ToD system in an English - speaking country. - **E&F**: English speakers use the ToD system in a foreign - language - speaking country. In addition, the paper also explores the prevalence of code - switching phenomena in cross - language and cross - country task - oriented dialogues, and experimentally demonstrates the deficiencies of current multilingual models in zero - shot cross - language transfer tasks. To improve the performance of the model, the paper proposes a series of data augmentation methods to train stronger baseline models.

GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems

Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

AllWOZ: Towards Multilingual Task-Oriented Dialog Systems for All

MultiWOZ -- A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

MultiWOZ 2.3: A Multi-Domain Task-Oriented Dialogue Dataset Enhanced with Annotation Corrections and Co-Reference Annotation

JMultiWOZ: A Large-Scale Japanese Multi-Domain Task-Oriented Dialogue Dataset

Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling

Zero-shot language extension for dialogue state tracking via pre-trained models and multi-auxiliary-tasks fine-tuning

Multi-User MultiWOZ: Task-Oriented Dialogues among Multiple Users

TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities

HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent

X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems

IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems

OSTOD: One-Step Task-Oriented Dialogue with activated state and retelling response

XDailyDialog: A Multilingual Parallel Dialogue Corpus

XQA-DST: Multi-Domain and Multi-Lingual Dialogue State Tracking