Abstract:Definition bias is a negative phenomenon that can mislead models. Definition bias in information extraction appears not only across datasets from different domains but also within datasets sharing the same domain. We identify two types of definition bias in IE: bias among information extraction datasets and bias between information extraction datasets and instruction tuning datasets. To systematically investigate definition bias, we conduct three probing experiments to quantitatively analyze it and discover the limitations of unified information extraction and large language models in solving definition bias. To mitigate definition bias in information extraction, we propose a multi-stage framework consisting of definition bias measurement, bias-aware fine-tuning, and task-specific bias mitigation. Experimental results demonstrate the effectiveness of our framework in addressing definition bias. Resources of this paper can be found at

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily explores the issue of definition bias in Information Extraction (IE) tasks and proposes a multi-stage framework to mitigate this bias. 1. **Existence of Definition Bias**: - The paper first analyzes through a series of experiments whether definition bias exists between different datasets and how this bias affects model performance. Cross-validation experiments reveal that fully supervised models trained on different datasets show a significant drop in performance on other datasets, indicating that definition bias negatively impacts the model's generalization ability. 2. **Can Unified Information Extraction Solve Definition Bias**: - The paper further investigates whether Unified Information Extraction (UIE) can address the definition bias issue. By introducing source prompts for experiments, the results show that even with different dataset names, UIE still exhibits inconsistent performance, indicating that UIE is also affected by definition bias. 3. **Can Large Language Models Handle Definition Bias**: - The paper also examines whether Large Language Models (LLMs) can solve the definition bias problem in zero-shot and few-shot settings. Although LLMs perform well, they still cannot completely overcome the challenges posed by definition bias, especially in context learning. ### Proposed Solution To address the above issues, the paper proposes a multi-stage framework consisting of three parts: 1. **Definition Bias Measurement**: - Use Fleiss’ Kappa statistic to quantify two types of definition bias: dataset definition bias (κD) and type definition bias (κT). 2. **Bias-Aware Fine-Tuning**: - Based on the definition bias measurement results, perform weighted instruction fine-tuning on LLMs to improve their performance in information extraction tasks. 3. **Task-Specific Bias Mitigation**: - Finally, employ Low-Rank Adaptation (LoRA) for additional instruction fine-tuning on specific datasets to further reduce the impact of definition bias. Through these methods, the paper aims to develop an effective framework to alleviate the issue of definition bias in information extraction tasks.

Is There a One-Model-Fits-All Approach to Information Extraction? Revisiting Task Definition Biases

On Robustness and Bias Analysis of BERT-Based Relation Extraction

Can Fine-tuning Pre-trained Models Lead to Perfect NLP? A Study of the Generalizability of Relation Extraction.

A Deep Investigation of Deep IR Models.

A Joint Model for Definition Extraction with Syntactic Connection and Semantic Consistency

Towards Building More Robust NER Datasets: an Empirical Study on NER Dataset Bias from a Dataset Difficulty View

Adaptive Ordered Information Extraction with Deep Reinforcement Learning

A Survey on Open Information Extraction from Rule-based Model to Large Language Model (meta)

Information Extraction in Domain and Generic Documents: Findings from Heuristic-based and Data-driven Approaches

Overcoming Semantic Drift in Information Extraction.

InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction.

De-Bias for Generative Extraction in Unified NER Task

UniEX: An Effective and Efficient Framework for Unified Information Extraction via a Span-extractive Perspective

A Survey on Neural Open Information Extraction: Current Status and Future Directions

Learning from Context or Names? an Empirical Study on Neural Relation Extraction

Research on Information Extraction:A Survey

Delving into Identify-Emphasize Paradigm for Combating Unknown Bias

Manual Evaluation Matters: Reviewing Test Protocols of Distantly Supervised Relation Extraction.

Information Redundancy and Biases in Public Document Information Extraction Benchmarks

On the Biased Assessment of Expert Finding Systems

A Bi-consolidating Model for Joint Relational Triple Extraction