Efficient Scheduling of Irregular Network Structures on CNN Accelerators

Shixuan Zheng,Xianjue Zhang,Daoli Ou,Shibin Tang,Leibo Liu,Shaojun Wei,Shouyi Yin
DOI: https://doi.org/10.1109/TCAD.2020.3012215
IF: 2.9
2020-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:The state-of-the-art convolutional neural network (CNN) structures present growing irregularity in the sense of layer connections, which derives from the innovative manual designs and the recently proposed neural architecture searching approaches. Such irregular structures improve recognition accuracy, but also bring challenges for hardware deployment, especially on CNN accelerators with regular architectures: 1) the complicated data dependency makes it nontrivial to decide the data reuse strategy between layers and 2) since the execution order of each network is not unique, the choice of layer scheduling, memory allocating, and loop tiling strategies greatly impact the hardware performance. These challenges cannot be solved by the existing CNN schedulers, which mainly focuses on the dataflow of a single layer. In this work, we propose a comprehensive framework to analyze and solve the mapping of an arbitrarily connected CNN network to specific hardware accelerators. We propose: 1) a dynamic programming and nodeclustering-based DAG partitioning approach to efficiently exploit interlayer data reuse and 2) a subgraph scheduling and onchip memory allocating strategy to find the optimal execution order. With the modeling of CNN accelerators, we also propose a loop tiling approach for fused layers. An automated framework is established to generate binary machine codes from original CNN models produced by mainstream deep learning frameworks, which can process large-scale CNNs with more than 1000 layers in only a few minutes. Experiments based on stateof-the-art accelerators (e.g., NVDLA) show that our techniques greatly reduce the external data transfer of interlayer dependencies and bring significant performance improvement over existing approaches.
What problem does this paper attempt to address?