Dense Re-Ranking with Weak Supervision for RDF Dataset Search

Qiaosheng Chen,Zixian Huang,Zhiyang Zhang,Weiqing Luo,Tengteng Lin,Qing Shi,Gong Cheng
DOI: https://doi.org/10.1007/978-3-031-47240-4_2
2023-01-01
Abstract:Dataset search aims to find datasets that are relevant to a keyword query. Existing dataset search engines rely on conventional sparse retrieval models (e.g., BM25). Dense models (e.g., BERT-based) remain under-investigated for two reasons: the limited availability of labeled data for fine-tuning such a deep neural model, and its limited input capacity relative to the large size of a dataset. To fill the gap, in this paper, we study dense re-ranking for RDF dataset search. Our re-ranking model encodes the metadata of RDF datasets and also their actual RDF data-by extracting a small yet representative subset of data to accommodate large datasets. To address the insufficiency of training data, we adopt a coarse-to-fine tuning strategy where we warm up the model with weak supervision from a large set of automatically generated queries and relevance labels. Experiments on the ACORDAR test collection demonstrate the effectiveness of our approach, which considerably improves the retrieval accuracy of existing sparse models.
What problem does this paper attempt to address?