Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

Beomseok Lee,Ioan Calapodescu,Marco Gaido,Matteo Negri,Laurent Besacier
2024-08-08
Abstract:We present Speech-MASSIVE, a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSIVE textual corpus. Speech-MASSIVE covers 12 languages from different families and inherits from MASSIVE the annotations for the intent prediction and slot-filling tasks. Our extension is prompted by the scarcity of massively multilingual SLU datasets and the growing need for versatile speech datasets to assess foundation models (LLMs, speech encoders) across languages and tasks. We provide a multimodal, multitask, multilingual dataset and report SLU baselines using both cascaded and end-to-end architectures in various training scenarios (zero-shot, few-shot, and full fine-tune). Furthermore, we demonstrate the suitability of Speech-MASSIVE for benchmarking other tasks such as speech transcription, language identification, and speech translation. The dataset, models, and code are publicly available at: <a class="link-external link-https" href="https://github.com/hlt-mt/Speech-MASSIVE" rel="external noopener nofollow">this https URL</a>
Computation and Language,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The paper aims to address the scarcity of multilingual Spoken Language Understanding (SLU) datasets and proposes a new dataset named Speech-MASSIVE. Specifically, the goals of the paper include: 1. **Filling the gap in multilingual SLU datasets**: Current SLU datasets are primarily focused on English, with relatively few datasets available for other languages. By creating Speech-MASSIVE, the authors hope to fill this gap. 2. **Providing support for diverse speech tasks**: In addition to SLU tasks, this dataset can also be used to evaluate various speech-related tasks such as Automatic Speech Recognition (ASR), Speech Translation (ST), and Language Identification (LID). 3. **Supporting benchmark testing under different training scenarios**: The paper provides SLU benchmark results under different training conditions (zero-shot, few-shot, and full-sample fine-tuning) to facilitate future research comparisons and improvements. Through these efforts, the paper aims to advance research in the field of multilingual SLU and provide a comprehensive benchmarking platform for related tasks.