Abstract:The development of deep learning programs, as a new programming paradigm, is observed to suffer from various defects. Emerging research works have been proposed to detect, debug, and repair deep learning bugs, which drive the need to construct the bug benchmarks. In this work, we present gDefects4DL, a dataset for general bugs of deep learning programs. Comparing to existing datasets, gDefects4DL collects bugs where the root causes and fix solutions can be well generalized to other projects. Our general bugs include deep learning program bugs such as (1) violation of deep learning API usage pattern (e.g., the standard to implement cross entropy function y · log(y), y → 0, without NaN error), (2) shape-mismatch of tensor calculation, (3) numeric bugs, (4) type-mismatch (e.g., confusing similar types among numpy, pytorch, and tensorflow), (5) violation of model architecture design convention, and (6) performance bugs.

Gdefects4dl