Abstract:In recent years, pre-trained language models (PLMs) have dominated natural language processing (NLP) and achieved outstanding performance in various NLP tasks, including dense retrieval based on PLMs. However, in the biomedical domain, the effectiveness of dense retrieval models based on PLMs still needs to be improved due to the diversity and ambiguity of entity expressions caused by the enrichment of biomedical entities. To alleviate the semantic gap, in this paper, we propose a method that incorporates external knowledge at the entity level into a dense retrieval model to enrich the dense representations of queries and documents. Specifically, we first add additional self-attention and information interaction modules in the Transformer layer of the BERT architecture to perform fusion and interaction between query/document text and entity embeddings from knowledge graphs. We then propose an entity similarity loss to constrain the model to better learn external knowledge from entity embeddings, and further propose a weighted entity concatenation mechanism to balance the impact of entity representations when matching queries and documents. Experiments on two publicly available biomedical retrieval datasets show that our proposed method outperforms state-of-the-art dense retrieval methods. In term of NDCG metrics, the proposed method (called ELK) improves the ranking performance of coCondenser by at least 5% on both two datasets, and also obtains further performance gain over state-of-the-art EVA methods. Though having a more sophisticated architecture, the average query latency of ELK is still within the same order of magnitude as that of other efficient methods.

ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models Via Contrastive Learning

Enriching Pre-trained Language Model with Entity Information for Relation Classification

RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection

End-to-end Named Entity Recognition and Relation Extraction using Pre-trained Language Models

Improving Relation Extraction by Knowledge Representation Learning

Research on entity relation extraction for Chinese medical text

Siamese BERT Model with Adversarial Training for Relation Classification.

An in-depth analysis of pre-trained embeddings for entity resolution

ConcEPT: Concept-Enhanced Pre-Training for Language Models

Ernie: Enhanced Language Representation With Informative Entities

LERT: A Linguistically-motivated Pre-trained Language Model

Incorporating entity-level knowledge in pretrained language model for biomedical dense retrieval

Pre-training Language Models for Comparative Reasoning

A Causal View of Entity Bias in (Large) Language Models

A Simple but Effective Pluggable Entity Lookup Table for Pre-trained Language Models

Nested and Balanced Entity Recognition using Multi-Task Learning

ER-LAC: Span-Based Joint Entity and Relation Extraction Model with Multi-Level Lexical and Attention on Context Features

Leveraging Pretrained Language Models for Enhanced Entity Matching: A Comprehensive Study of Fine-Tuning and Prompt Learning Paradigms

KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness

PLRTE: Progressive learning for biomedical relation triplet extraction using large language models