Abstract:Bilingual Lexicon Induction (BLI) is a core task in multilingual NLP that still, to a large extent, relies on calculating cross-lingual word representations. Inspired by the global paradigm shift in NLP towards Large Language Models (LLMs), we examine the potential of the latest generation of LLMs for the development of bilingual lexicons. We ask the following research question: Is it possible to prompt and fine-tune multilingual LLMs (mLLMs) for BLI, and how does this approach compare against and complement current BLI approaches? To this end, we systematically study 1) zero-shot prompting for unsupervised BLI and 2) few-shot in-context prompting with a set of seed translation pairs, both without any LLM fine-tuning, as well as 3) standard BLI-oriented fine-tuning of smaller LLMs. We experiment with 18 open-source text-to-text mLLMs of different sizes (from 0.3B to 13B parameters) on two standard BLI benchmarks covering a range of typologically diverse languages. Our work is the first to demonstrate strong BLI capabilities of text-to-text mLLMs. The results reveal that few-shot prompting with in-context examples from nearest neighbours achieves the best performance, establishing new state-of-the-art BLI scores for many language pairs. We also conduct a series of in-depth analyses and ablation studies, providing more insights on BLI with (m)LLMs, also along with their limitations.

Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision.

Bilingual lexicon induction from non-parallel corpora

Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision

Unsupervised Bilingual Lexicon Induction Via Latent Variable Models.

Semi-Supervised Learning for Bilingual Lexicon Induction

Semi-Supervised Bilingual Lexicon Induction with Two-way Interaction

Adversarial Training for Unsupervised Bilingual Lexicon Induction

Inducing Bilingual Lexica from Non-Parallel Data with Earth Mover's Distance Regularization.

Earth Mover's Distance Minimization for Unsupervised Bilingual Lexicon Induction.

Improving Bilingual Lexicon Induction on Distant Language Pairs

How Lexical is Bilingual Lexicon Induction?

Bilingual word embedding fusion for robust unsupervised bilingual lexicon induction

Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport

A Relaxed Matching Procedure for Unsupervised BLI

A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction

Word Translation Without Parallel Data

Discovering Bilingual Lexicons in Polyglot Word Embeddings

On the Limitations of Unsupervised Bilingual Dictionary Induction

A deep learning approach to bilingual lexicon induction in the biomedical domain

On Bilingual Lexicon Induction with Large Language Models

RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction