<i>EvalDNN</i>: A Toolbox for Evaluating Deep Neural Network Models

Yongqiang Tian,Zhihua Zeng,Ming Wen,Yepang Liu,Tzu-yang Kuo,Shing-Chi Cheung
DOI: https://doi.org/10.1145/3377812.3382133
2020-01-01
Abstract:Recent studies have shown that the performance of deep learning models should be evaluated using various important metrics such as robustness and neuron coverage, besides the widely-used prediction accuracy metric. However, major deep learning frameworks currently only provide APIs to evaluate a model's accuracy. In order to comprehensively assess a deep learning model, framework users and researchers often need to implement new metrics by themselves, which is a tedious job. What is worse, due to the large number of hyper-parameters and inadequate documentation, evaluation results of some deep learning models are hard to reproduce, especially when the models and metrics are both new. To ease the model evaluation in deep learning systems, we have developed EvalDNN, a user-friendly and extensible toolbox supporting multiple frameworks and metrics with a set of carefully designed APIs. Using EvalDNN, evaluation of a pre-trained model with respect to different metrics can be done with a few lines of code. We have evaluated EvalDNN on 79 models from TensorFlow, Keras, GluonCV, and PyTorch. As a result of our effort made to reproduce the evaluation results of existing work, we release a performance benchmark of popular models, which can be a useful reference to facilitate future research. The tool and benchmark are available at https://github.com/yqtianust/EvalDNN and https: //yqtianust.github.io/EvalDNN-benchmark/, respectively. A demo video of EvalDNN is available at: https://youtu.be/v69bNJN2bJc.
What problem does this paper attempt to address?