Optimizing CNN Accelerator With Improved Roofline Model

Shaoxia Fang,Shulin Zeng,Yu Wang
DOI: https://doi.org/10.1109/SOCC49529.2020.9524754
2020-01-01
Abstract:The external memory I/O bandwidth is the most common performance bottleneck for Convolutional Neural Network(CNN) inference accelerators. On the other hand, performance is also affected by many other factors such as the on-chip memory size and data scheduling strategies, making it difficult to identify the root cause of performance degradation. This paper proposes an improved roofline model specif...
What problem does this paper attempt to address?