Locality based warp scheduling in GPGPUs.

Yang Zhang,Zuocheng Xing,Cang Liu,Chuan Tang,Qinglin Wang
DOI: https://doi.org/10.1016/j.future.2017.02.036
IF: 7.307
2018-01-01
Future Generation Computer Systems
Abstract:As the need for high performance computing continues to grow, it becomes more and more urgent to design a massive multi-core processor with high throughput and efficiency. However, when the number of cores keeps increasing, the capacity of on-chip memory is always insufficient. In a multi-core processor such as GPGPU (General Purpose Graphic Processor Unit), dozens or hundreds of SMs (Stream Multi-processor) coordinate to gain high throughput with several MB on-chip memory. Furthermore, in one SM, thousands of threads are organized as thread blocks to process instructions in a SIMT (Single Instruction Multiple Threads) manner. As all the threads share the same on-chip memory, the mismatch between large core number and small on-chip memory capacity can easily impair the performance due to excessive thread contention for cache resource.
What problem does this paper attempt to address?