Abstract:Measuring the semantic similarity between sentences is an essential issue for many applications, such as text summarization, Web page retrieval, question-answer model, image extraction, and so forth. A few studies have explored on this issue by several techniques, e.g., knowledge-based strategies, corpus-based strategies, hybrid strategies, etc. Most of these studies focus on how to improve the effectiveness of the problem. In this paper, we address the efficiency issue, i.e., for a given sentence collection, how to efficiently discover the top- k semantic similar sentences to a query. The previous methods cannot handle the big data efficiently, i.e., applying such strategies directly is time consuming because every candidate sentence needs to be tested. In this paper, we propose efficient strategies to tackle such problem based on a general framework. The basic idea is that for each similarity, we build a corresponding index in the preprocessing. Traversing these indices in the querying process can avoid to test many candidates, so as to improve the efficiency. Moreover, an optimal aggregation algorithm is introduced to assemble these similarities. Our framework is general enough that many similarity metrics can be incorporated, as will be discussed in the paper. We conduct extensive experimental evaluation on three real datasets to evaluate the efficiency of our proposal. In addition, we illustrate the trade-off between the effectiveness and efficiency. The experimental results demonstrate that the performance of our proposal outperforms the state-of-the-art techniques on efficiency while keeping the same high precision as them.

A Rank Aggregation Algorithm for Efficiently Searching Top-k Semantic Similar Sentences

Exploration on Efficient Similar Sentences Extraction.

Efficient Searching Top-K Semantic Similar Words

Efficient Top-k Similar Short Texts Extraction Algorithm

A Graph-Based Approach for Semantic Similar Word Retrieval

Exploring Simultaneous Keyword and Key Sentence Extraction

Exploring simultaneous keyword and key sentence extraction: improve graph-based ranking using wikipedia.

Exploration on Effectiveness and Efficiency of Similar Sentence Matching.

A Fast Approach For Semantic Similar Short Texts Retrieval

Aggregation-Aware Top-k Computation for Full-Text Search

Performance Evaluation of Similar Sentences Extraction

A New Method for Calculating Similarity Between Sentences and Application on Automatic Abstracting

A Study on Similar Words Searching

Semantic Relevance Ranking for XML Keyword Search.

Processing Spatial Keyword Query As a Top-K Aggregation Query

Exploration on Similar Spatial Textual Objects Retrieval

Identifying structural semantics for XML top-k keyword search

ON AUTOMATIC ABSTRACTING ALGORITHM BASED ON OPTIMISED SENTENCES SIMILARITY CALCULATION

A Fuzzy Word Similarity Measure for Selecting Top-$k$ Similar Words in Query Expansion.

Diversified and Verbalized Result Summarization for Semantic Association Search

A Semantic Relevancy Measure Algorithm of Chinese Sentences.