GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks

Ihor Stepanov,Mykhailo Shtopko
2024-08-01
Abstract:Information extraction tasks require both accurate, efficient, and generalisable models. Classical supervised deep learning approaches can achieve the required performance, but they need large datasets and are limited in their ability to adapt to different tasks. On the other hand, large language models (LLMs) demonstrate good generalization, meaning that they can adapt to many different tasks based on user requests. However, LLMs are computationally expensive and tend to fail to generate structured outputs. In this article, we will introduce a new kind of GLiNER model that can be used for various information extraction tasks while being a small encoder model. Our model achieved SoTA performance on zero-shot NER benchmarks and leading performance on question-answering, summarization and relation extraction tasks. Additionally, in this article, we will cover experimental results on self-learning approaches for named entity recognition using GLiNER models.
Machine Learning,Artificial Intelligence,Computation and Language,Information Retrieval
What problem does this paper attempt to address?
The paper primarily aims to address several key issues in the Information Extraction (IE) task, including: 1. **Efficiency**: Developing models that can handle large amounts of unstructured data while conserving computational resources. 2. **Accuracy**: Ensuring high accuracy in fields with low error tolerance, such as biomedical or commercial domains. 3. **Generalization Ability**: Enabling the model to easily adapt to new tasks and domains. To tackle these issues, the authors propose a new model—GLiNER Multi-Task, a lightweight and versatile model suitable for various information extraction tasks. This model is based on the GLiNER architecture and uses DeBERTa v3 as the base encoder. Compared to other methods, it has the following features: - **Efficient and Controllable**: More efficient and easier to control compared to large language models (LLMs). - **Structured Output**: Generates more structured output compared to generative models. - **Zero-shot Learning**: Achieves state-of-the-art performance in tasks like zero-shot named entity recognition (Zero-shot NER). - **Multi-task Learning**: Excels in tasks such as question answering, summarization, and relation extraction. The study also explores the impact of self-training techniques on the model's performance and demonstrates the model's performance across different tasks through experiments. Additionally, the model is trained using synthetic datasets to enhance its generalization ability. Overall, this research provides an efficient, accurate, and flexible new solution for information extraction tasks.