Microblog Retrieval Based on Concept-Enhanced Pre-Training Model

Yashen Wang,Zhaoyu Wang,Huanhuan Zhang,Zhirun Liu
DOI: https://doi.org/10.1145/3552311
IF: 4.157
2022-01-01
ACM Transactions on Knowledge Discovery from Data
Abstract:Despite substantial interest in applications of neural networks to information retrieval, neural ranking models have mostly been applied to conventional ad-hoc retrieval tasks over web pages and newswire articles. This article proposes a concept-enhanced pre-training model for microblog retrieval task, leveraging Semantic Matching Model (SMM) objective and Concept Correlation Model (CCM) objective. The proposed model is a novel neural ranking model specifically designed for ranking short-text microblog, which could merge the advantage of pre-training methodology for generating valid contextualized embedding with the superiority of the prior lexical knowledge (e.g., concept knowledge) for understanding short-text language semantic. We conduct experiments on widely used real-world datasets, and the experimental results demonstrate the efficiency of the proposed model, even compared with latest state-of-the-art neural-based models and pretraining based models.
What problem does this paper attempt to address?