Abstract:Networks emerging nowadays usually have labels or textual content on the nodes. We model such commonly seen network as an undirected graph G, in which each node is attached with zero or more keywords, and each edge is assigned with a length. On such networks, a novel and useful query is called top-k nearest keyword (\(\mathsf {k\text {-}NK}\)) search. Given a query node q in G and a keyword \(\lambda \), a \(\mathsf {k\text {-}NK}\) query searches k nodes which contain \(\lambda \) and are nearest to q. The \(\mathsf {k\text {-}NK}\) problem has been studied recently in the literature. But most existing solutions assume that the graph as well as the constructed index can fit entirely in memory. As a result, they cannot be applied directly to very large-scale networks which are commonly found in practice, but cannot fit in memory. In this work, we design an I/O-efficient solution, which uses a compact disk index to answer a \(\mathsf {k\text {-}NK}\) query with constant I/Os. The key to an accurate \(\mathsf {k\text {-}NK}\) result is a precise shortest distance estimation in a graph. In our solution, we follow our previous work Qiao et al. (PVLDB 6:901–912, 2013) which uses the shortest path tree as an approximate representation of a graph and uses the tree distance between two nodes as an accurate estimation of the shortest distance between them on a graph. With such representation, the original \(\mathsf {k\text {-}NK}\) query on a graph can be reduced to answering the query on a set of trees and then assembling the results obtained from the trees. We exploit a compact tree-based index and study how to lay out the index to disk. We design a novel technique which decomposes the index tree into paths and subtrees and stores them in disk. Our theoretical analysis shows that the disk-based index is small in size and supports constant query I/Os. Extensive experimental study on massive trees and graphs with billions of edges and keywords verifies our theoretical findings and demonstrates the superiority of our method over the state-of-the-art methods in the literature.

Optimal K-Nearest-Neighbor Query in Data Grid

Nearest group queries.

Answering k-NN Query of Chinese Calligraphic Character Based on Data Grid

Constrained All-k-Nearest-Neighbor Search

Discovery of Regional Co-location Patterns with k-Nearest Neighbor Graph.

Reverse top-k group nearest neighbor search

Grid Interpolation Algorithm Based on Nearest Neighbor Fast Search

Efficient Parallel Processing of High-Dimensional Spatial K NN Queries

Efficient parallel processing for K-nearest-neighbor search in spatial databases

Processing Continuous K -Nearest Neighbor Queries in Location- Dependent Application

All-Visible-k-Nearest-Neighbor Queries.

Efficient Reverse $k$ Approximate Nearest Neighbor Search over High-Dimensional Vectors

Optimal-Nearest-Neighbor Queries

Reverse K Nearest Neighbors Query Processing

Graph-Indexed kNN Query Optimization on Road Network

Surface k-NN Query Processing

K Nearest Neighbor Queries and Knn-Joins in Large Relational Databases (almost) for Free

Fast $k$-NNG construction with GPU-based quick multi-select

I/O-efficient Algorithms for Top-K Nearest Keyword Search in Massive Graphs.

Utility Based Query Dissemination in Spatial Data Grid

Efficient Selection Algorithm for Fast k-NN Search on GPUs