Abstract:Concurrent B+trees have been widely used in many systems. With the scale of data requests increasing exponentially, the systems are facing tremendous performance pressure. GPU has shown its potential to accelerate concurrent B+trees performance. When many concurrent requests are processed, the conflicts should be detected and resolved. Prior methods guarantee the correctness of concurrent GPU B+trees through lock-based or software transactional memory (STM)-based approaches. However, these methods complicate the request processing logic, increase the number of memory accesses and bring execution path divergence. They lead to performance degradation and variance in response time increasing. Moreover, previous methods do not guarantee linearizability among concurrent requests. In this paper, we design a combined-based concurrency control framework, called Eirene, for GPU B+tree to reduce the overhead of conflict detection and resolution. First, a combining-based synchronization method is designed to combine and issue requests. It combines the requests with the same key, constructs their dependence, decides the issued request, and determines their return values. Since only one request for each key is issued, key conflicts are eliminated. Then, an optimistic STM method is used to reduce structure conflicts. The query and the update requests are partitioned into different kernels. For the update kernels, STM is involved only when the number of the retry reaches a threshold. Finally, a locality-aware warp reorganization optimization is proposed to improve memory behavior and reduce conflicts by exploiting the locality among requests. Evaluations on an NVIDIA A100 GPU show that Eirene is efficient (a throughput of 2.4 billion per second) and can guarantee linearizability. Compared to the state-of-the-art GPU B+tree, it can achieve a speedup of 7.43X and reduce the response time variance from 36% to 5%.

High Performance GPU Concurrent B+tree

POSTER: High Performance GPU Concurrent B plus tree

Boosting Performance and QoS for Concurrent GPU B+trees by Combining-Based Synchronization.

Fast BVH construction on GPU

MSKD: multi-split KD-tree design on GPU

Efficient Data Management for Incoherent Ray Tracing.

A High Throughput B+tree for SIMD Architectures

Concurrent Binary Trees for Large-Scale Game Components

GPU Lock-Free Hopscotch Hash Table

QuickTree: A Fast Hardware BVH Construction Engine

Memory-Scalable GPU Spatial Hierarchy Construction

Quadboost: A Scalable Concurrent Quadtree

Efficient BVH-based Collision Detection Scheme with Ordering and Restructuring.

Realtime Ray Tracing on a Hibrid Parallel Architecture

Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications.

Optimization of GPU-Based Main-Memory Hash Join

Real-time KD-tree Construction on Graphics Hardware.

Hybrid CPU-GPU scheduling and execution of tree traversals

Harnessing GPU Power for Enhanced OLTP: A Study in Concurrency Control Schemes

Implementation of a Parallel Tree Method on a GPU

Parallel L-BFGS-B Algorithm on GPU.