MACA: Memory-aware Convolution Accelerating for CNN Inference on Edge Devices

Chaoxiong Yi,Songlei Jian,Yusong Tan,Yusen Zhang
DOI: https://doi.org/10.1109/cscwd61410.2024.10580144
2024-01-01
Abstract:Deep learning inference tasks develop towards the edge due to their latency requirements and privacy issues. However, edge devices are limited by their power consumption and size, and generally have limited resources. The convolutional neural networks (CNN) is commonly used in image processing tasks which contains a large number of convolutional layers, accounting for more than 95% of the calculation time in most general used CNN model. This paper proposes a general convolutional layer optimization method called MACA, we implement and optimize a variety of convolution operators and design a memory-aware convolution operator automatic selection strategy to select appropriate operator, without modifying user code. Finally, we integrate MACA into PyTorch and conduct extensive experiments. The results show that when memory resources are sufficient, MACA can effectively increase the inference by 36.10% on average, and can reduce memory usage by an average of 29.14% to complete inference when resources are tight. This paper provides an effective solution for deploying deep learning models on resource-constrained edge devices.
What problem does this paper attempt to address?