An Empirical Study on Independence-Driven Data Selection for Improving Capture-Recapture Estimation

Qiuping Zhang,Guoping Rong,He Zhang
DOI: https://doi.org/10.1145/2915970.2915991
2016-01-01
Abstract:Background: The Capture-recapture (CRC) method has been adopted in software inspection post-inspection defect estimation. One outstanding advantage of the CRC method is that it is able to produce objective estimates without relying on historical data. However, a common impression about the CRC method is its poor performance regarding estimation accuracy with small inspection teams. Involving more inspectors seems to be helpful, yet no conclusive results exist on the reasonable team size in order to get acceptable CRC estimates. Objective: While a number of factors impacting the accuracy of CRC estimation have been identified, in this study, we aim to explore and investigate new reliable and practical factors hence to provide new method to improve the accuracy of CRC estimation. Method: By examining and verifying the fundamental assumption of the CRC method (i.e., the assumption of independence among inspectors), we establish valuable understanding regarding the root cause of the performance of CRC estimation accuracy, based on which we propose a strategy to mitigate the impact derived from violation of the independence assumption and improve the accuracy of CRC estimation. Results: By applying our strategy to remove small portion of inspectors from the original inspection team, we managed to decrease the dependence among inspectors and improve estimation accuracy for the three most popular CRC estimators (i.e., Mt -- CH, Mh -- JK and Mh -- CH, cf. Section 2.1). Conclusions: Our study implies that 'more inspectors, higher accuracy' is not always valid for CRC estimation in software inspection and a independence-driven strategy to select suitable defect data may produce better estimation.
What problem does this paper attempt to address?