JLeaks: A Featured Resource Leak Repository Collected from Hundreds of Open-Source Java Projects
Tianyang Liu,Weixing Ji,Xiaohui Dong,Wuhuang Yao,Yizhuo Wang,Hui Liu,Haiyang Peng,Yuxuan Wang
DOI: https://doi.org/10.1145/3597503.3639162
2024-01-01
Abstract:High-quality defect repositories are vital in defect detection, local-ization, and repair. However, existing repositories collected from open-source projects are either small-scale or inadequately labeled and packed. This paper systematically summarizes the program-ming APIs of system resources (i.e., file, socket, and thread) in Java. Additionally, this paper demonstrates the exceptions that may cause resource leaks in the chained and nested streaming operations. A semi-automatic toolchain is built to improve the efficiency of de-fect extraction, including automatic building for large legacy Java projects. Accordingly, 1,094 resource leaks were collected from 321 open-source projects on GitHub. This repository, named JLeaks, was built by round-by-round filtering and cross-validation, involving the review of approximately 3,185 commits from hundreds of projects. JLeaks is currently the largest resource leak repository, and each defect in JLeaks is well-labeled and packed, including causes, locations, patches, source files, and compiled bytecode files for 254 defects. We have conducted a detailed analysis of JLeaks for defect distribution, root causes, and fix approaches. We compare JLeaks with two well-known resource leak repositories, and the results show that JLeaks is more informative and complete, with high availability, uniqueness, and consistency. Additionally, we show the usability of JLeaks in two application scenarios. Future studies can leverage our repository to encourage better design and implementation of defect-related algorithms and tools.