SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

Javier Picorel,Seyed Alireza Sanaee Kohroudi,Zi Yan,Abhishek Bhattacharjee,Babak Falsafi,Djordje Jevdjic
DOI: https://doi.org/10.48550/arXiv.2001.07045
2020-01-20
Abstract:Virtual memory (VM) is critical to the usability and programmability of hardware accelerators. Unfortunately, implementing accelerator VM efficiently is challenging because the area and power constraints make it difficult to employ the large multi-level TLBs used in general-purpose CPUs. Recent research proposals advocate a number of restrictions on virtual-to-physical address mappings in order to reduce the TLB size or increase its reach. However, such restrictions are unattractive because they forgo many of the original benefits of traditional VM, such as demand paging and copy-on-write. We propose SPARTA, a divide and conquer approach to address translation. SPARTA splits the address translation into accelerator-side and memory-side parts. The accelerator-side translation hardware consists of a tiny TLB covering only the accelerator's cache hierarchy (if any), while the translation for main memory accesses is performed by shared memory-side TLBs. Performing the translation for memory accesses on the memory side allows SPARTA to overlap data fetch with translation, and avoids the replication of TLB entries for data shared among accelerators. To further improve the performance and efficiency of the memory-side translation, SPARTA logically partitions the memory space, delegating translation to small and efficient per-partition translation hardware. Our evaluation on index-traversal accelerators shows that SPARTA virtually eliminates translation overhead, reducing it by over 30x on average (up to 47x) and improving performance by 57%. At the same time, SPARTA requires minimal accelerator-side translation hardware, reduces the total number of TLB entries in the system, gracefully scales with memory size, and preserves all key VM functionalities.
Hardware Architecture,Operating Systems
What problem does this paper attempt to address?
This paper attempts to address the efficiency and flexibility issues in the implementation of virtual memory (VM) in hardware accelerators. Specifically, the existing virtual memory implementations in accelerators face the following challenges: 1. **Area and Power Consumption Constraints**: Due to the strict area and power consumption limitations of hardware accelerators, large multi - level TLBs (Translation Lookaside Buffers) like those in general - purpose CPUs cannot be used, which leads to a performance bottleneck in address translation. 2. **Limitations of Address Mapping**: In order to reduce the size of the TLB or increase its coverage, some studies suggest imposing specific limitations on the mapping from virtual addresses to physical addresses. However, these limitations sacrifice many of the benefits brought by traditional virtual memory, such as demand paging and copy - on - write. 3. **Address Translation Overhead**: The traditional address translation mechanism has a relatively high latency on accelerators, especially when traversing page tables, which requires crossing multiple network layers and memory controllers, resulting in performance degradation. To solve these problems, the paper proposes SPARTA (Split and PARtitioned Translation for Accelerators), a divide - and - conquer address translation method. The main contributions of SPARTA include: - **Hierarchical Translation**: SPARTA divides the address translation task into two parts, the accelerator - side and the memory - side. The translation hardware on the accelerator - side only contains a small TLB, covering the cache hierarchy of the accelerator (if any). The translation on the memory - side is completed by the shared memory - side TLB/MMU. - **Logical Partitioning**: SPARTA divides the physical memory space into multiple logical partitions and ensures that each virtual address uniquely identifies the partition where its data is located. This enables parallel execution of data retrieval and address translation, improving performance. - **Low Overhead and High Flexibility**: SPARTA almost eliminates the address translation overhead, reducing it by an average of 31.5 times (up to 47 times at most), while improving performance by 57%. It also retains all the key functions of virtual memory, such as demand paging and copy - on - write, and imposes minimal limitations on the mapping from virtual addresses to physical addresses. Through these designs, SPARTA significantly improves the address translation performance and efficiency of accelerators while maintaining the flexibility of virtual memory.