Gdefects4dl: A Dataset of General Real-World Deep Learning Program Defects
Yunkai Liang,Yun Lin,Xuezhi Song,Jun Sun,Zhiyong Feng,Jin Song Dong
DOI: https://doi.org/10.1145/3510454.3516826
2022-01-01
Abstract:The development of deep learning programs, as a new programming paradigm, is observed to suffer from various defects. Emerging research works have been proposed to detect, debug, and repair deep learning bugs, which drive the need to construct the bug benchmarks. In this work, we present gDefects4DL, a dataset for general bugs of deep learning programs. Comparing to existing datasets, gDefects4DL collects bugs where the root causes and fix solutions can be well generalized to other projects. Our general bugs include deep learning program bugs such as (1) violation of deep learning API usage pattern (e.g., the standard to implement cross entropy function y•log(y), y → 0, without NaN error), (2) shape-mismatch of tensor calculation, (3) numeric bugs, (4) type-mismatch (e.g., confusing similar types among numpy, pytorch, and tensorflow), (5) violation of model architecture design convention, and (6) performance bugs.For each bug in gDefects4DL, we describe why it is general and group the bugs with similar root causes and fix solutions for reference. Moreover, gDefects4DL also maintains (1) its buggy/fixed versions and the isolated fix change, (2) an isolated environment to replicate the defect, and (3) the whole code evolution history from the buggy version to the fixed version. We design gDefects4DL with extensible interfaces to evaluate software engineering methodologies and tools. We have integrated tools such as ShapeFlow, DEBAR, and GRIST. gDefects4DL contains 64 bugs falling into 6 categories (i.e., API Misuse, Shape Mismatch, Number Error, Type Mismatch, Violation of Architecture Convention, and Performance Bug). gDefects4DL is available at https://github.com/llmhyy/defects4dl, its online web demonstration is at http://47.93.14.147:9000/bugList, and the demo video is at https://youtu.be/0XtaEt4Fhm4.