Abstract:ABSTRACTLarge web search engines are facing formidable performance challenges because they have to process thousands of queries per second on tens of billions of documents, within interactive response time. Among many others, Top-k query processing (also called early termination or dynamic pruning) is an important class of optimization techniques that can improve the search efficiency and achieve faster query processing by avoiding the scoring of documents that are unlikely to be in the top results. One recent technique is using Block-Max index. In the Block-Max index, the posting lists are organized as blocks and the maximum score for each block is stored to improve the query efficiency. Although query processing speedup is achieved with Block-Max index, the ranking function for the Top-k results is the term-based approach. It is well known that documents' static scores are also important for a good ranking function. In this paper, we show that the performance of the state-of-the-art algorithms with the Block-Max index is degraded when the static score is added in the ranking function. Then we study efficient techniques for Top-k query processing in the case where a page's static score is given, such as PageRank, in addition to the term-based approach. In particular, we propose a set of new algorithms based on the WAND and MaxScore with Block-Max index using local score, which outperform the existing ones. Then we propose new techniques to estimate a better score upper bound for each block. We also study the search efficiency on different index structures where the document identifiers are assigned by URL sorting or by static document scores. Experiments on TREC GOV2 and ClueWeb09B show that considerable performance gains are achieved.

TKAP: Efficiently processing top- k query on massive data by adaptive pruning

Efficient Pruned Top-K Subgraph Matching with Topology-Aware Bounds

Efficient Top-K Query Processing Algorithms in Highly Distributed Environments

Efficient Parallel Processing of High-Dimensional Spatial K NN Queries

Efficient Pruning for Top-K Ranking Queries on Attribute-Wise Uncertain Datasets

Supporting Efficient Top-K Queries in Type-Ahead Search

Top-k Dominating Queries on Incomplete Data

Optimized top-k processing with global page scores on block-max indexes.

Efficient Processing of Top K Group Skyline Queries.

Scalable Top-K Spatial Keyword Search

Aggregation-Aware Top-k Computation for Full-Text Search

Efficient top-k processing in large-scaled distributed environments

Efficient Algorithms for Top-k Keyword Queries on Spatial Databases

Efficient Algorithms For Historical Continuous Knn Query Processing Over Moving Object Trajectories

Efficient Processing of Top-K Queries: Selective Nra Algorithms

Efficient Pruning Algorithm for Top-K Ranking on Dataset with Value Uncertainty

Processing Long Queries Against Short Text

Efficiently answering top-k frequent term queries in temporal-categorical range

Top-k queries on RDF graphs

Adaptive convex skyline: a threshold-based project partitioned layer-based index for efficient-processing top-k queries in entrepreneurship applications

Real Time Personalized Search on Social Networks