numaPTE: Managing Page-Tables and TLBs on NUMA Systems

Bin Gao,Qingxuan Kang,Hao-Wei Tee,Kyle Timothy Ng Chu,Alireza Sanaee,Djordje Jevdjic
2024-01-28
Abstract:Memory management operations that modify page-tables, typically performed during memory allocation/deallocation, are infamous for their poor performance in highly threaded applications, largely due to process-wide TLB shootdowns that the OS must issue due to the lack of hardware support for TLB coherence. We study these operations in NUMA settings, where we observe up to 40x overhead for basic operations such as munmap or mprotect. The overhead further increases if page-table replication is used, where complete coherent copies of the page-tables are maintained across all NUMA nodes. While eager system-wide replication is extremely effective at localizing page-table reads during address translation, we find that it creates additional penalties upon any page-table changes due to the need to maintain all replicas coherent.
Operating Systems
What problem does this paper attempt to address?