Efficient Algorithms for Top-k Stabbing Queries on Weighted Interval Data (Full Version)

Daichi Amagata,Junya Yamada,Yuchen Ji,Takahiro Hara
2024-05-22
Abstract:Intervals have been generated in many applications (e.g., temporal databases), and they are often associated with weights, such as prices. This paper addresses the problem of processing top-k weighted stabbing queries on interval data. Given a set of weighted intervals, a query value, and a result size $k$, this problem finds the $k$ intervals that are stabbed by the query value and have the largest weights. Although this problem finds practical applications (e.g., purchase, vehicle, and cryptocurrency analysis), it has not been well studied. A state-of-the-art algorithm for this problem incurs $O(n\log k)$ time, where $n$ is the number of intervals, so it is not scalable to large $n$. We solve this inefficiency issue and propose an algorithm that runs in $O(\sqrt{n }\log n + k)$ time. Furthermore, we propose an $O(\log n + k)$ algorithm to further accelerate the search efficiency. Experiments on two real large datasets demonstrate that our algorithms are faster than existing algorithms.
Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **the efficiency problem of handling top - k stabbing queries on weighted interval data**. Specifically, given a set of weighted intervals, a query value, and a result size \(k\), it is required to find the \(k\) intervals with the largest weights that are stabbed by the query value. Although this problem has many applications in practice (such as in purchasing, vehicle, and cryptocurrency analysis), the existing algorithms are not efficient when dealing with large - scale data. ### Specific Problem Description 1. **Background and Motivation**: - Interval data is generated in many applications, such as in time databases. - Each interval usually has a weight, such as price or profit. - In practical applications, it is necessary to handle top - k weighted stabbing queries, that is, to find the \(k\) intervals with the largest weights that are stabbed by the query value. 2. **Deficiencies of Existing Methods**: - The time complexity of the existing state - of - the - art algorithms is \(O(n\log k)\), where \(n\) is the number of intervals. This is not efficient when \(n\) is large. - Simple stabbing queries do not consider weights and cannot control the result size, which may lead to too many results. 3. **Research Objectives**: - Design a more efficient algorithm to handle top - k weighted stabbing queries so that it can run quickly on large - scale data sets. - Ensure that the space complexity of the algorithm is \(\tilde{O}(n)\) and the query time is less than \(O(n)\). ### Main Contributions of the Paper 1. **Proposed an algorithm with a time complexity of \(O(\sqrt{n}\log n + k)\)**: - Utilize weight - based sorting and interval tree structures. - It is faster than the existing algorithms under the same space requirements. 2. **Further improved to an algorithm with a time complexity of \(O(\log n + k)\)**: - Utilize an improved line segment tree structure. - Further accelerate the search efficiency. 3. **Experimental Verification**: - Experiments were carried out on two real large - scale data sets, proving that the new algorithm is faster than the existing algorithms. - Especially when \(k\in[25, 100]\), the new algorithm only needs less than two microseconds. ### Summary This paper solves the efficiency problem of handling top - k stabbing queries on large - scale weighted interval data by designing two new algorithms, significantly improves the query speed, and performs well in practical applications.