The Utility Challenge of Privacy-Preserving Data-Sharing in Cross-Company Defect Prediction: an Empirical Study of the CLIFF&MORPH Algorithm

Yi Fan,Chenxi Lv,Xu Zhang,Guoqiang Zhou,Yuming Zhou
DOI: https://doi.org/10.1109/icsme.2017.57
2017-01-01
Abstract:In practice, the data owners of source projects may need to share data without disclosing sensitive information. Therefore, privacy-preserving data-sharing becomes an important topic in cross-company defect prediction (CCDP). In this context, the challenge is how to achieve a high privacy-preserving level while ensuring the utility of the shared privatized data for CCDP. CLIFF&MORPH is a recently proposed state-of-the-art privacy-preserving data-sharing algorithm for CCDP. It has been reported that the CLIFF&MORPH CCDP model produces a promising defect prediction performance. However, we find that ManualDown, a simple (unsupervised) module size model, built on the target projects has a comparable or even better defect prediction performance. Since ManualDown does not require any source project data to build the model, it is free of the privacy-preserving data-sharing challenges for CCDP. This means that, for practitioners, the motivation of applying privacy-preserving data-sharing algorithms to CCDP could not be well justified if the utility challenge is not addressed. We analyze the implications of our findings and outline the directions for future research. In particular, we strongly suggest that future studies at least use ManualDown as a baseline model for comparison to help develop practical privacy-preserving data-sharing algorithms for CCDP.
What problem does this paper attempt to address?