WARDER: Refining Cell Clustering for Effective Spreadsheet Defect Detection via Validity Properties

Da Li,Huiyan Wang,Chang Xu,Fengmin Shi,Xiaoxing Ma,Jian Lu
DOI: https://doi.org/10.1109/QRS.2019.00030
2019-01-01
Abstract:Spreadsheets are widely used, but subject to various defects and severe consequences due to poor maintenance by end users. Existing spreadsheet defect detection techniques fall short of effectiveness, either due to limited scopes or relying on rigid patterns. In this paper, we discuss and improve one state-of-the-art technique, CUSTODES, which uses cell clustering and anomaly detection to extend its scope and make its patterns adaptive to varying spreadsheet styles, but is prone to fragile clustering when involving irrelevant cells, leading to a largely reduced detection precision. We present WARDER to refine CUSTODES's cell clustering based on validity properties, and experimental results show that WARDER improves the precision by 20.7% on average or reach 100% for 79.8% worksheets on cell clustering, which contributes to a precision improvement of 23.1% for defect detection. WARDER also exhibits satisfactory results, against other spreadsheet defect detection techniques, and on another large-scale spreadsheet corpus VEnron2.
What problem does this paper attempt to address?