HASP: Hierarchical Asynchronous Parallelism for Multi-NN Tasks

Hongyi Li,Songchen Ma,Taoyi Wang,Weihao Zhang,Guanrui Wang,Chenhang Song,Huanyu Qu,Junfeng Lin,Cheng Ma,Jing Pei,Rong Zhao
DOI: https://doi.org/10.1109/tc.2023.3329937
IF: 3.183
2024-01-01
IEEE Transactions on Computers
Abstract:The rapid development of deep learning has propelled many real-world artificial intelligence (AI) applications. Many of these applications integrate multiple neural network (multi-NN) models to cater to various functionalities. Although a number of multi-NN acceleration technologies have been explored, few can fully fulfill the flexibility and scalability required by emerging and diverse AI workloads, especially for mobile. Among these, homogeneous multi-core architectures have great potential to support multi-NN execution by leveraging decentralized parallelism and intrinsic scalability. However, the advantages of multi-core systems are underexploited due to the adoption of bulk synchronization parallelism (BSP), which is inefficient to meet the diversity of multi-NN workloads. This paper reports a hierarchical multi-core architecture with asynchronous parallelism to enhance multi-NN execution for higher performance and utilization. Hierarchical asynchronous parallel (HASP) is the theoretical foundation, which establishes a programmable and grouped dynamic synchronous-asynchronous framework for multi-NN acceleration. HASP can be implemented on a typical multi-core processor for multi-NN with minor modifications. We further developed a prototype chip to validate the hardware effectiveness of this design. A mapping strategy that combines spatial partitioning and temporal tuning is also developed, which allows the proposed architecture to promote resource utilization and throughput simultaneously.
What problem does this paper attempt to address?