Agglomerative Memory and Thread Scheduling for High-Performance Ray-Tracing on GPUs

Yufei Ni,Yangdong Deng,Zonghui Li
DOI: https://doi.org/10.1109/tcad.2021.3058910
IF: 2.9
2022-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Ray-tracing rendering has long been considered as a promising technology to enable a higher level of visual experience. The democratization of the ray-tracing rendering to consumer platforms, however, poses significant challenges to rendering hardware and software due to its highly irregular computing patterns. In fact, modern ray-tracing techniques typically depend on a tree-based acceleration structure to reduce the computing complexity of intersection testing of rays and graphics primitives. The traversal by a massive number of rays on a graphics processing unit (GPU) incurs a significant amount of irregular memory traffic, which turns out to be a major stumbling block for real-time performance. In this work, a scheduling mechanism, so-called Agglomerative Memory and Thread Scheduling, is proposed to unleash the inherence parallelism in the ray-tracing process on GPUs. It is associated with a tile-based ray-tracing framework in which the acceleration structure (i.e., KD-tree in this work) is partitioned into subtrees that can be completely loaded into the on-chip L1 cache inside a streaming multiprocessor. An effective scheduling mechanism collects threads with regard to the subtrees hit by their respective rays and regroup threads into warps for dispatching. In addition, subtrees are dynamically preloaded into the L1 cache of multiprocessors in an on-demand fashion. The proposed scheduler can be integrated on today's high-end GPUs with only minor overhead. Microarchitecture simulation results prove that the proposed framework significantly improves memory efficiency and outperforms a traditional GPU microarchitecture by 47.4% for average.
What problem does this paper attempt to address?