Memory-efficient Deep Learning Inference with Incremental Weight Loading and Data Layout Reorganization on Edge Systems.

Cheng Ji,Fan Wu,Zongwei Zhu,Li-Pin Chang,Huanghe Liu,Wenjie Zhai
DOI: https://doi.org/10.1016/j.sysarc.2021.102183
IF: 5.836
2021-01-01
Journal of Systems Architecture
Abstract:Pattern recognition applications such as face recognition and agricultural product detection have drawn a rapid interest on Cyber–Physical–Social-Systems (CPSS). These CPSS applications rely on the deep neural networks (DNN) to conduct the image classification. However, traditional DNN inference models in the cloud could suffer from network delay fluctuations and privacy leakage problems. In this regard, current real-time CPSS applications are preferred to be deployed on edge-end embedded devices. Constrained by the computing power and memory limitations of edge devices, improving the memory management efficacy is the key to improving the quality of service for model inference. First, this study explored the incremental loading strategy of model weights for the model inference. Second, the memory space at runtime is optimized through data layout reorganization from the spatial dimension. In particular, the proposed schemes are orthogonal to existing models. Experimental results demonstrate that the proposed approach reduced the memory consumption by 61.05% without additional inference time overhead.
What problem does this paper attempt to address?