Unified-TP: A Unified TLB and Page Table Cache Structure for Efficient Address Translation
Zhulin Ma,Yujuan Tan,Hong Jiang,Zhichao Yan,Duo Liu,Xianzhang Chen,Qingfeng Zhuge,Edwin Hsing-Mean Sha,Chengliang Wang
DOI: https://doi.org/10.1109/iccd50377.2020.00052
2020-01-01
Abstract:To improve the performance of address translation in applications with large memory footprints, techniques, such as hugepages and HW coalescing, are proposed to increase the coverage of limited hardware translation entries by exploiting the contiguous memory allocation to lower Tanslation Lookaside Buffer (TLB) miss rate. Furthermore, Page Table Caches (PTCs) are proposed to store the upper-level page table entries to reduce the TLB miss handling latency. Both increasing TLB coverage and reducing TLB miss handling latency have proved to be effective in speeding up address translation, to a certain extent. Nevertheless, our preliminary studies suggest that the structural separation between TLBs and PTCs in existing computer systems makes these two methods less effective because they are exclusively used in TLBs and PTCs respectively. In particular, the separate structures cannot dynamically adjust their sizes according to the workloads, resulting in low resource utilization and inefficient address translation. To address these issues, we propose a unified structure, called Unified - Tp,which stores PTC and TLB entries together. Besides, Our modified LRU algorithm helps identify the cold TLB and PTC entries and dynamically adjust the numbers of TLB and PTC entries to adapt to different workloads. Furthermore, we introduce a scheme of parallel search when receiving memory access requests. Our experimental results show that Unified-TP can reduce the numbers of TLB misses by an average of 35.69 % and improve the performance by an average of 11.12% compared with separately structured TLBs and PTCs.