Abstract:Large-scale cross-lingual language models (LM), such as mBERT, Unicoder and XLM, have achieved great success in cross-lingual representation learning. However, when applied to zero-shot cross-lingual transfer tasks, most existing methods use only single-language input for LM finetuning, without leveraging the intrinsic cross-lingual alignment between different languages that proves essential for multilingual tasks. In this paper, we propose FILTER, an enhanced fusion method that takes cross-lingual data as input for XLM finetuning. Specifically, FILTER first encodes text input in the source language and its translation in the target language independently in the shallow layers, then performs cross-language fusion to extract multilingual knowledge in the intermediate layers, and finally performs further language-specific encoding. During inference, the model makes predictions based on the text input in the target language and its translation in the source language. For simple tasks such as classification, translated text in the target language shares the same label as the source language. However, this shared label becomes less accurate or even unavailable for more complex tasks such as question answering, NER and POS tagging. To tackle this issue, we further propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language. Extensive experiments demonstrate that FILTER achieves new state of the art on two challenging multilingual multi-task benchmarks, XTREME and XGLUE.

Automatic Filtration of Multiword Units

Automatic Extraction and Filtration of Multiword Units1.

AUTOMATIC EXTRACTION OF CHINESE-ENGLISH PHRASE TRANSLATION PAIRS

Finite State Automata on Multi-Word Units for Efficient Text-Mining

Research on Automatic Chinese Multi-word Term Extraction Based on Integration of Web Information and Term Component

Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

Automatic Extraction of Multiword Expressions Combining Statistical and Similarity Approaches

A study on the classification of stylistic and formal features in English based on corpus data testing

Technical Phrase Extraction for Patent Mining: A Multi-level Approach

New Word Extraction from Chinese Financial Documents.

A Patent Keyword Extraction Method Based on Corpus Classification

New Word Detection Using BiLSTM+CRF Model with Features

Research on Automatic Chinese Multi-word Term Extraction Based on Term Component

Human-Computer Interactive Chinese Word Segmentation: an Adaptive Dirichlet Process Mixture Model Approach.

A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

Word Vector Enrichment of Low Frequency Words in the Bag-of-Words Model for Short Text Multi-class Classification Problems

WordTopic-MultiRank: A New Method for Automatic Keyphrase Extraction.

Chinese Multi-word Chunks Extraction for Computer Aided Translation

Text Filtering through Multi-Pattern Matching: A Case Study of Wu–Manber–Uy on the Language of Uyghur