A Semi-Explicit Short Text Retrieval Method Combining Wikipedia Features
Pu Li,Tianci Li,Suzhi Zhang,Yuhua Li,Yong Tang,Yuncheng Jiang
DOI: https://doi.org/10.1016/j.engappai.2020.103809
IF: 8
2020-01-01
Engineering Applications of Artificial Intelligence
Abstract:With the advantages such as openness, interactivity, immediacy, and simplicity, the large number of short text data appear in the Web information space. Considering the short length, little information, sparse features and irregular grammar, the traditional information analyzing and retrieval technologies cannot deal with short text effectively. In view of the above problems, in this paper a new short text retrieval method based on the current mainstream semantic knowledge source, Wikipedia, is proposed. To be specific, a semantic feature selection algorithm is proposed to return the top k most relevant Wikipedia concepts as the whole vector space for a given short text. Thus, by analyzing the topic information of the semantic features contained in Wikipedia concepts, we propose some formulas to determine the association coefficient list between different components of the corresponding positions in two different feature vectors. On this basis, a new semantic relatedness assessment method under this lower dimensional semantic space is designed. According to computing and sorting the semantic relatedness between user queries and the target short text, a novel semi-explicit short text retrieval method combining Wikipedia concept feature and the corresponding topic information is proposed. Lastly, based on the experimental results on twitter subsets, we verify that our proposal has advantages over other some current retrieval methods on MAP, P@k and R-Prec, and can return more valid results.