Abstract:Empirical evidence suggests that LLMs exhibit spontaneous cross-lingual alignment. Our findings suggest that although LLMs also demonstrate promising cross-lingual alignment in Information Extraction, there remains significant imbalance across languages, revealing an underlying deficiency in the IE alignment. To address this issue, we propose AlignXIE, a powerful code-based LLM that significantly enhances cross-lingual IE alignment through two strategies. Firstly, AlignXIE formulates IE across different languages, especially non-English ones, as code generation tasks, standardizing the representation of various schemas using Python classes to ensure consistency of the same ontology in different languages and align the schema. Secondly, it incorporates an IE cross-lingual alignment phase through a translated instance prediction task proposed in this paper to align the extraction process, utilizing ParallelNER, an IE bilingual parallel dataset with 257,190 samples, generated by our proposed LLM-based automatic pipeline for IE parallel data construction, with manual annotation to ensure quality. Ultimately, we obtain AlignXIE through multilingual IE instruction tuning. Although without training in 9 unseen languages, AlignXIE surpasses ChatGPT by $30.17\%$ and SoTA by $20.03\%$, thereby demonstrating superior cross-lingual IE capabilities. Comprehensive evaluations on 63 IE benchmarks in Chinese and English under various settings, demonstrate that AlignXIE significantly enhances cross-lingual and multilingual IE through boosting the IE alignment.

What problem does this paper attempt to address?

### The Problem Addressed by the Paper The paper aims to address the issue of cross-lingual alignment in Multilingual Information Extraction (Multilingual IE). Although Large Language Models (LLMs) exhibit some spontaneous cross-lingual alignment capabilities in information extraction tasks, there remains a significant imbalance in alignment between different languages, especially in non-English languages. This imbalance reveals potential flaws in cross-lingual alignment for information extraction. Specifically, the paper raises the following two main issues: 1. **Imbalance in Cross-Lingual Alignment**: Despite LLMs showing some cross-lingual alignment capabilities in information extraction tasks, there is still a significant imbalance in alignment between different languages, particularly in non-English languages. 2. **Performance Gap in Cross-Lingual Information Extraction**: There is a significant performance gap in information extraction between different languages, indicating that existing cross-lingual alignment methods perform poorly in some languages. To address these issues, the paper proposes a method called AlignXIE, which significantly enhances cross-lingual alignment in information extraction through two strategies: 1. **Unified Code Generation Framework**: Standardizes information extraction tasks in different languages as code generation tasks, using Python classes to represent various patterns, ensuring consistency of the same ontology across different languages. 2. **Cross-Lingual Alignment Phase**: Enhances the alignment of the extraction process through a translation instance prediction task, utilizing the ParallelNER parallel dataset for alignment. Ultimately, AlignXIE is obtained through multilingual information extraction instruction tuning and demonstrates significantly superior cross-lingual information extraction capabilities compared to existing methods in multiple benchmark tests.

AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual Alignment

ADELIE: Aligning Large Language Models on Information Extraction

Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment

Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks

Advancing Cross-Lingual Entity Alignment with Large Language Models: Tailored Sample Segmentation and Zero-Shot Prompts

Interactive Cross-Lingual Ontology Matching

AlignBench: Benchmarking Chinese Alignment of Large Language Models

Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention

Extrapolating Large Language Models to Non-English by Aligning Languages

Align-then-Enhance: Multilingual Entailment Graph Enhancement with Soft Predicate Alignment

Unsupervised Deep Cross-Language Entity Alignment

MT4CrossOIE: Multi-stage Tuning for Cross-lingual Open Information Extraction

CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment

RUIE: Retrieval-based Unified Information Extraction using Large Language Model

Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction

MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

GIELLM: Japanese General Information Extraction Large Language Model Utilizing Mutual Reinforcement Effect

Improving Low-resource Reading Comprehension via Cross-lingual Transposition Rethinking

Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models