Abstract:With the advantages such as openness, interactivity, immediacy, and simplicity, the large number of short text data appear in the Web information space. Considering the short length, little information, sparse features and irregular grammar, the traditional information analyzing and retrieval technologies cannot deal with short text effectively. In view of the above problems, in this paper a new short text retrieval method based on the current mainstream semantic knowledge source, Wikipedia, is proposed. To be specific, a semantic feature selection algorithm is proposed to return the top k most relevant Wikipedia concepts as the whole vector space for a given short text. Thus, by analyzing the topic information of the semantic features contained in Wikipedia concepts, we propose some formulas to determine the association coefficient list between different components of the corresponding positions in two different feature vectors. On this basis, a new semantic relatedness assessment method under this lower dimensional semantic space is designed. According to computing and sorting the semantic relatedness between user queries and the target short text, a novel semi-explicit short text retrieval method combining Wikipedia concept feature and the corresponding topic information is proposed. Lastly, based on the experimental results on twitter subsets, we verify that our proposal has advantages over other some current retrieval methods on MAP, P@k and R-Prec, and can return more valid results.

A Novel Vector Representation Model for Text Mining Based on Enhancing Features

Incorporating Knowledge into Neural Network for Text Representation.

A comparative study for wordnet guided text representation

Chinese Text Semantic Representation for Text Classification

An Exploration Of Semantic Relations In Neural Word Embeddings Using Extrinsic Knowledge

A Concept-Relation Vector Model Based Method for Web Document Retrieval

Exploring Wikipedia and query log's ability for text feature representation

A new document representation using term frequency and vectorized graph connectionists with application to document retrieval

Tripartite-Replicated Softmax Model for Document Representations.

A novel model for semantic similarity measurement based on wordnet and word embedding

A New Vector Representation of Short Texts for Classification

A Semi-Structured Document Model for Text Mining.

Sentence Vector Model Based on Implicit Word Vector Expression

Graph-Based Text Similarity Measurement by Exploiting Wikipedia As Background Knowledge

A document feature extraction method based on concept-word list

A Kind of Vector Space Representation Model Based on Semantic in the Field of English Standard Information

A Novel Semi-Supervised Learning Framework with Simultaneous Text Representing

A Semi-Explicit Short Text Retrieval Method Combining Wikipedia Features

A kind of vector space representation model based on semantic

An Unsupervised Graph Based Continuous Word Representation Method for Biomedical Text Mining.

Research on Text Representation Model Integrated Semantic Relationship