Cross-Lingual Transfer Learning for Question Answering

Chia-Hsuan Lee,Hung-Yi Lee

DOI: https://doi.org/10.48550/arXiv.1907.06042

2019-07-13

Abstract:Deep learning based question answering (QA) on English documents has achieved success because there is a large amount of English training examples. However, for most languages, training examples for high-quality QA models are not available. In this paper, we explore the problem of cross-lingual transfer learning for QA, where a source language task with plentiful annotations is utilized to improve the performance of a QA model on a target language task with limited available annotations. We examine two different approaches. A machine translation (MT) based approach translates the source language into the target language, or vice versa. Although the MT-based approach brings improvement, it assumes the availability of a sentence-level translation system. A GAN-based approach incorporates a language discriminator to learn language-universal feature representations, and consequentially transfer knowledge from the source language. The GAN-based approach rivals the performance of the MT-based approach with fewer linguistic resources. Applying both approaches simultaneously yield the best results. We use two English benchmark datasets, SQuAD and NewsQA, as source language data, and show significant improvements over a number of established baselines on a Chinese QA task. We achieve the new state-of-the-art on the Chinese QA dataset.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use data from resource - rich source languages (such as English) to improve the performance of question - answering systems in target languages (such as Chinese) in cross - language scenarios. Specifically, the paper explores two methods: 1. **Machine - translation - based method**: Knowledge transfer is achieved by translating source - language data into the target language or vice versa. Although this method is effective, it depends on high - quality sentence - level translation systems, which are not available for all language pairs. 2. **Generative Adversarial Network (GAN) - based method**: By introducing a language discriminator to learn language - invariant feature representations, cross - language knowledge transfer can be achieved without a sentence - level translation system. This method only requires a word - to - word bilingual dictionary to achieve performance comparable to that of the machine - translation - based method. The paper also shows that the best results can be obtained by combining these two methods and has reached a new state - of - the - art level in Chinese question - answering tasks. In this way, researchers hope to train high - performance question - answering models in resource - scarce target languages.

Cross-Lingual Transfer Learning for Question Answering

Bridging the Language Gap: Knowledge Injected Multilingual Question Answering

A Joint Model For Question-Answering Over Traditional Chinese Medicine

Cross-lingual Transfer for Automatic Question Generation by Learning Interrogative Structures in Target Languages

Promoting Generalized Cross-lingual Question Answering in Few-resource Scenarios via Self-knowledge Distillation

Supervised and Unsupervised Transfer Learning for Question Answering

Learning to Answer Multilingual and Code-Mixed Questions

Improving Zero-Shot Cross-lingual Transfer for Multilingual Question Answering over Knowledge Graph

XQA: A Cross-lingual Open-domain Question Answering Dataset

xGQA: Cross-Lingual Visual Question Answering

PAXQA: Generating Cross-lingual Question Answering Examples at Training Scale

Cross-Lingual Training for Automatic Question Generation

ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System

Pre-training Cross-lingual Open Domain Question Answering with Large-scale Synthetic Supervision

Cross-lingual QA: A Key to Unlocking In-context Cross-lingual Performance

Leveraging Large Language Models for Multiple Choice Question Answering

M2QA: Multi-domain Multilingual Question Answering

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

Cross-Lingual Question Answering over Knowledge Base as Reading Comprehension

GSQA: An End-to-End Model for Generative Spoken Question Answering

Open Domain Question Answering with Character-Level Deep Learning Models.