Grouping Cores for Chip Multiprocessors Optimization

LI Guohong,WANG Dongsheng,LIU Zhenyu,LI Chongmin,LIU Genxian,GUO Sanchuan
DOI: https://doi.org/10.3778/j.issn.1673-9418.1309012
2014-01-01
Abstract:In chip multiprocessors (CMP), as the number of cores increases, the average distance between the requestors and the home nodes becomes longer, and certain hot nodes are incurred by the unbalanced accesses to the different banks of the distributed share cache. These cases lead to the higher average latency of L1 cache misses. To conquer this problem, this paper divides the cores into groups of 2×2 nodes, and introduces the neighboring data prober (NDP). By deciding if a miss can be served by the L1 cache of a neighbor node, NDP can leverage the node-level spatial locality of the data accesses of parallel programs. Also, this paper optimizes the coherence protocol for the new architecture. The evaluation results illustrate that the proposed cache optimization improves the performance, lowers the network traffic and saves energy.
What problem does this paper attempt to address?