Realistic Zero-Shot Cross-Lingual Transfer in Legal Topic Classification

Stratos Xenouleas,Alexia Tsoukara,Giannis Panagiotakis,Ilias Chalkidis,Ion Androutsopoulos
DOI: https://doi.org/10.48550/arXiv.2206.03785
2022-06-08
Abstract:We consider zero-shot cross-lingual transfer in legal topic classification using the recent MultiEURLEX dataset. Since the original dataset contains parallel documents, which is unrealistic for zero-shot cross-lingual transfer, we develop a new version of the dataset without parallel documents. We use it to show that translation-based methods vastly outperform cross-lingual fine-tuning of multilingually pre-trained models, the best previous zero-shot transfer method for MultiEURLEX. We also develop a bilingual teacher-student zero-shot transfer approach, which exploits additional unlabeled documents of the target language and performs better than a model fine-tuned directly on labeled target language documents.
Computation and Language
What problem does this paper attempt to address?