Embrace Sustainable AI: Dynamic Data Subset Selection for Image Classification

Zimo Yin,Jian Pu,Ru Wan,Xiangyang Xue
DOI: https://doi.org/10.1016/j.patcog.2024.110392
IF: 8
2024-01-01
Pattern Recognition
Abstract:Data selection is commonly used to reduce costs and energy usage by training on a subset of available data. However, determining the appropriate subset size requires extensive dataset knowledge and experimentation, limiting transferability. Varying the validation set also produces unstable results and wastes computational resources. In this paper, we propose a data selection method for dynamically determining subset ratios based on model performance using only a training set. The data search space is narrowed through weighted sampling, leveraging statistical selection patterns. Parallel analysis of class distributions identifies the most representative samples with high selection potential. Extensive experiments validate our approach and demonstrate improved training efficiency. Our method speeds up various subset ratios by up to 2.2x on CIFAR-10, 1.9x on CIFAR-100, 2.0x on TinyImageNet, and 2.1x on ImageNet with negligible accuracy drops.
What problem does this paper attempt to address?