High Throughput CNN Inference and Training with In-Cache Computation

Xiaowei Wang,Li Zhao,Pengcheng Li
DOI: https://doi.org/10.1109/iccd50377.2020.00084
2020-01-01
Abstract:We present an architecture for CNN training and batched inference with in-cache computing. Specifically targeting the high throughput requirements of the training and inference workloads, we propose novel resource partitioning and work scheduling strategies to balance the in-place computing and data storage requirements on the last level cache. Further, we propose a compression mechanism to reduce the data movement between the cache and the main memory. For ResNet-50, the proposed architecture achieves 2.2× better throughput, compared to a state-of-the-art CNN inference engine with in-cache computing [1] with a baseline scheduling policy, and a training throughput of 65.4 images per second.
What problem does this paper attempt to address?