Abstract:Recent progress in task-oriented neural dialogue systems is largely focused on a handful of languages, as annotation of training data is tedious and expensive. Machine translation has been used to make systems multilingual, but this can introduce a pipeline of errors. Another promising solution is using cross-lingual transfer learning through pretrained multilingual models. Existing methods train multilingual models with additional code-mixed task data or refine the cross-lingual representations through parallel ontologies. In this work, we enhance the transfer learning process by intermediate fine-tuning of pretrained multilingual models, where the multilingual models are fine-tuned with different but related data and/or tasks. Specifically, we use parallel and conversational movie subtitles datasets to design cross-lingual intermediate tasks suitable for downstream dialogue tasks. We use only 200K lines of parallel data for intermediate fine-tuning which is already available for 1782 language pairs. We test our approach on the cross-lingual dialogue state tracking task for the parallel MultiWoZ (English -> Chinese, Chinese -> English) and Multilingual WoZ (English -> German, English -> Italian) datasets. We achieve impressive improvements (> 20% on joint goal accuracy) on the parallel MultiWoZ dataset and the Multilingual WoZ dataset over the vanilla baseline with only 10% of the target language task data and zero-shot setup respectively.

Fine-tuning on Clean Data for End-to-End Speech Translation: FBK @ IWSLT 2018

Dealing with training and test segmentation mismatch: FBK@IWSLT2021

Efficient yet Competitive Speech Translation: FBK@IWSLT2022

Pre-Trained Acoustic-and-Textual Modeling for End-To-End Speech-To-Text Translation.

The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task

Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023

SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation

Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation

The USTC-NELSLIP Offline Speech Translation Systems for IWSLT 2022

A Comparative Study on End-to-end Speech to Text Translation

ESPnet-ST IWSLT 2021 Offline Speech Translation System.

Cross-lingual Intermediate Fine-tuning improves Dialogue State Tracking

LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models

Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024

Tuning Large language model for End-to-end Speech Translation

Clean Text and Full-Body Transformer: Microsoft's Submission to the WMT22 Shared Task on Sign Language Translation

The USYD-JD Speech Translation System for IWSLT 2021

KIT's Multilingual Speech Translation System for IWSLT 2023

The USTC-NEL Speech Translation system at IWSLT 2018

Multilingual Speech Translation with Efficient Finetuning of Pretrained Models

Facebook AI WMT21 News Translation Task Submission