ATOM: Automated Black-Box Testing of Multi-Label Image Classification Systems.

Shengyou Hu,Huayao Wu,Peng Wang,Jing Chang,Yongjun Tu,Xiu Jiang,Xintao Niu,Changhai Nie
DOI: https://doi.org/10.1109/ase56229.2023.00156
2024-01-01
Abstract:Multi-label Image Classification Systems (MICSs) developed based on Deep Neural Networks (DNNs) are extensively used in people's daily life. Currently, although there are a variety of approaches to test DNN-based systems, they typically rely on the internals of DNNs to design test cases, and do not take the core specification of MICS (i.e., correctly recognizing multiple objects in a given image) into account. In this paper, we propose ATOM, an automated and systematic black-box testing framework for testing MICS. Specifically, ATOM exploits the label combination as the testing adequacy criteria, hoping to systematically examine the impact of correlations between a fixed number of labels on the classification ability of MICS. Then, ATOM leverages image search engine and natural language processing to find test images that are not only common to the real-world, but also relevant to target label combinations. Finally, ATOM combines metamorphic testing and label information to realize test oracle identification, based on which the ability of MICS in classifying different label combinations is evaluated. To evaluate the effectiveness of ATOM, we have performed experiments on two popular datasets of MICS, VOC and COCO (each with five state-of-the-art DNN models), and one real-world photo tagging application from our industrial partner. The experimental results reveal that the performance of current DNN-based MICSs remains less satisfactory even in recognizing correlations between only two labels, as ATOM triggers a total number of 6,049 such label combination related errors for all MICSs studied. In particular, ATOM reports 587 error-revealing images for the industrial MICS, in which 92% of them are confirmed by the developers.
What problem does this paper attempt to address?