Poster: Diagnosing and Treating Code-Duplication Problems in Bioinformatics Libraries.

Mohammad Shabbir Hasan,Saima Sultana Tithi,Eli Tilevich,Liqing Zhang
DOI: https://doi.org/10.1109/iccabs.2016.7802784
2016-01-01
Abstract:As computing is an enabling tool of bioinformatics, software quality can influence not only the efficiency of the research process, but also the degree of confidence in scientific findings. As we discovered, popular bioinformatics C++ libraries suffer from problems that make their code hard to maintain, finetune, and extend. In particular, code duplication caused by the ubiquitous copy-and-paste development practice, substantially complicates software maintenance and evolution. The presence of multiple clones of the same code snippet multiples the amount of effort required to modify or extend it. In this paper, we present the results of a systematic study we have conducted to understand the code quality of popular bioinformatics libraries. Based on the results of our study, we developed an automated tool that systematically identifies and consolidates duplicated code blocks. Here we describe our tool—ReBio1—and the results of applying it to improve the quality of several commonly used C++ libraries, including SeqAn, BEDtools, and NCBI C++ Toolkit. Our results reveal that these libraries indeed suffer from poor maintainability, and that our automated tool can effectively improve their quality.
What problem does this paper attempt to address?