WARDER: Towards Effective Spreadsheet Defect Detection by Validity-Based Cell Cluster Refinements

Yicheng Huang,Chang Xu,Yanyan Jiang,Huiyan Wang,Da Li
DOI: https://doi.org/10.1016/j.jss.2020.110615
IF: 3.5
2020-01-01
Journal of Systems and Software
Abstract:Nowadays spreadsheets are very popular and being widely used. However, they can be prone to various defects and cause severe consequences when end users poorly maintain them. Our research communities have proposed various techniques for automated detection of spreadsheet defects, but they commonly fall short of effectiveness, either due to their limited scope or relying on strict patterns. In this article, we discuss and improve one state-of-the-art technique, CUSTODES, which exploits spreadsheet cell clustering and defect detection to extend its scope and make its detection patterns adaptive to varying spreadsheet styles. Still, CUSTODES can be prone to problematic clustering when accidentally involving irrelevant cells, leading to a largely reduced detection precision. Regarding this, we present WARDER to refine CUSTODES's spreadsheet cell clustering based on three extensible validity-based properties. Experimental results show that WARDER could improve the precision by 19.1% on spreadsheet cell clustering, which contributed to a precision improvement of 23.3 similar to 24.3% for spreadsheet defect detection, as compared to CUSTODES (F-measure increased from 0.71 to 0.79 similar to 0.82). WARDER also exhibited satisfactory results on another practical large-scale spreadsheet corpus VEnron2, improving the defect detection precision by 10.7 similar to 21.2% over CUSTODES. (C) 2020 Elsevier Inc. All rights reserved.
What problem does this paper attempt to address?