Towards an Optimal Space-and-Query-Time Index for Top-k Document Retrieval

Wing-Kai Hon,Rahul Shah,Sharma V. Thankachan
DOI: https://doi.org/10.1007/978-3-642-31265-6_14
2012-01-01
Abstract:Let $$\cal{D} = $$ {d1,d2,...dD} be a given set of D string documents of total length n, our task is to index $$\cal{D}$$, such that the k most relevant documents for an online query pattern P of length p can be retrieved efficiently. We propose an index of size |CSA| + nlogD(2 + o(1)) bits and O(ts(p) + kloglogn + polyloglogn) query time for the basic relevance metric term-frequency, where |CSA| is the size (in bits) of a compressed full text index of $$\cal{D}$$, with O(ts(p)) time for searching a pattern of length p. We further reduce the space to |CSA| + nlogD(1 + o(1)) bits, however the query time will be O(ts(p) + k(logσloglogn)1 + ε + polyloglogn), where σ is the alphabet size and ε > 0 is any constant.
What problem does this paper attempt to address?