Abstract:Concurrent B+trees have been widely used in many systems. With the scale of data requests increasing exponentially, the systems are facing tremendous performance pressure. GPU has shown its potential to accelerate concurrent B+trees performance. When many concurrent requests are processed, the conflicts should be detected and resolved. Prior methods guarantee the correctness of concurrent GPU B+trees through lock-based or software transactional memory (STM)-based approaches. However, these methods complicate the request processing logic, increase the number of memory accesses and bring execution path divergence. They lead to performance degradation and variance in response time increasing. Moreover, previous methods do not guarantee linearizability among concurrent requests. In this paper, we design a combined-based concurrency control framework, called Eirene, for GPU B+tree to reduce the overhead of conflict detection and resolution. First, a combining-based synchronization method is designed to combine and issue requests. It combines the requests with the same key, constructs their dependence, decides the issued request, and determines their return values. Since only one request for each key is issued, key conflicts are eliminated. Then, an optimistic STM method is used to reduce structure conflicts. The query and the update requests are partitioned into different kernels. For the update kernels, STM is involved only when the number of the retry reaches a threshold. Finally, a locality-aware warp reorganization optimization is proposed to improve memory behavior and reduce conflicts by exploiting the locality among requests. Evaluations on an NVIDIA A100 GPU show that Eirene is efficient (a throughput of 2.4 billion per second) and can guarantee linearizability. Compared to the state-of-the-art GPU B+tree, it can achieve a speedup of 7.43X and reduce the response time variance from 36% to 5%.

RegTT: Accelerating Tree Traversals on GPUs by Exploiting Regularities

MSKD: multi-split KD-tree design on GPU

Efficient Data Management for Incoherent Ray Tracing.

Hybrid CPU-GPU scheduling and execution of tree traversals

Stack-based Parallel Recursion on Graphics Processors.

Realtime Ray Tracing on a Hibrid Parallel Architecture

POSTER: High Performance GPU Concurrent B plus tree

High Performance GPU Concurrent B+tree

GPU-Accelerated Rectilinear Steiner Tree Generation.

OpenCL-Based Real-Time KD-Tree and Raytracing for Dynamic Scene

Mining Effective Parallelism from Hidden Coherence for GPU Based Path Tracing.

Accelerating truss decomposition on heterogeneous processors

Efficient Kd-Tree Construction for Ray Tracing Using Ray Distribution Sampling

Improving branch divergence performance on GPGPU with a new PDOM stack and multi-level warp scheduling.

Parallel Frequent Pattern Mining Without Candidate Generation on GPUs.

Boosting Performance and QoS for Concurrent GPU B+trees by Combining-Based Synchronization.

G-Tran: A High Performance Distributed Graph Database with a Decentralized Architecture

Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores

Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSV

Agglomerative Memory and Thread Scheduling for High-Performance Ray-Tracing on GPUs

Energy-Efficient Graph Traversal on Integrated CPU-GPU Architectures