PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis

Siyu Wu,Ruihao Gong,Hailong Yang,Yi Liu,Depei Qian,Xin You,Zhongzhi Luan
DOI: https://doi.org/10.1145/3673038.3673116
2024-01-01
Abstract:The increasing diversity of deep neural network (DNN) models and hardware platforms necessitates effective model profiling for high-performance inference deployment. Current DNN profiling tools suffer from either limited optimization insights due to the missing correlation between high-level DNN layer design and low-level hardware performance metrics, or prohibitive profiling overhead due to the large amount of performance measurement through hardware performance counters. Meanwhile, the roofline model has been widely used in the high-performance computing (HPC) domain for identifying performance bottlenecks and guiding optimizations. However, it lacks hierarchical (e.g., kernel/operator/layer), fine-grained, multi-platform support for profiling DNN models. To overcome the above limitations, we propose PRoof, a versatile DNN profiling framework, that can effectively attribute the hardware performance metrics back to the model design. In addition, PRoof does not require massive hardware profiling and thus mitigates the large profiling overhead. Specifically, our approach correlates the profiled result of each layer to their conceptual layer design by effectively handling layer fusion. Our approach also provides an analytical model to predict the floating-point operations (FLOP) and memory accesses of DNN models without massive profiling. We demonstrate the effectiveness of PRoof with representative DNN models across a wide range of hardware platforms. Derived from PRoof’s profiling results, we obtain several insights to provide useful guidance for model design and hardware tuning.
What problem does this paper attempt to address?