NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

Bongjoon Hyun,Youngeun Kwon,Yujeong Choi,John Kim,Minsoo Rhu
DOI: https://doi.org/10.48550/arXiv.1911.06859
2019-11-16
Abstract:To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms. Similar to how GPUs have evolved from a slave device into a mainstream processor architecture, it is likely that NPUs will become first class citizens in this fast-evolving heterogeneous architecture space. This paper makes a case for enabling address translation in NPUs to decouple the virtual and physical memory address space. Through a careful data-driven application characterization study, we root-cause several limitations of prior GPU-centric address translation schemes and propose a memory management unit (MMU) that is tailored for NPUs. Compared to an oracular MMU design point, our proposal incurs only an average 0.06% performance overhead.
Hardware Architecture,Distributed, Parallel, and Cluster Computing,Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?