Multiple-Boundary Clustering and Prioritization to Promote Neural Network Retraining

Eijun Shen,Yanhui Li,Lin Chen,Yuanlei Han,Yuming Zhou,Baowen Xu
DOI: https://doi.org/10.1145/3324884.3416621
2020-01-01
Abstract:With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the testing data to evaluate them. To bridge the gap, testers aim to collect an effective subset of inputs from the testing contexts, with limited labeling effort, for retraining DL models. To assist the subset selection, we propose Multiple-Boundary Clustering and Prioritization (MCP), a technique to cluster test samples into the boundary areas of multiple boundaries for DL models and specify the priority to select samples evenly from all boundary areas, to make sure enough useful samples for each boundary reconstruction. To evaluate MCP, we conduct an extensive empirical study with three popular DL models and 33 simulated testing contexts. The experiment results show that, compared with state-of-the-art baseline methods, on effectiveness, our approach MCP has a significantly better performance by evaluating the improved quality of retrained DL models; on efficiency, MCP also has the advantages in time costs.
What problem does this paper attempt to address?