Selecting Distinctive-Variant Training Samples Base on Intra-class Similarity

Hang Diao,Zhengchang Liu,Fan Zhang,Jiaqing Huang,Feiyu Zhou,Samee U. Khan
DOI: https://doi.org/10.1007/978-3-031-44201-8_22
2023-01-01
Abstract:Deep learning models often require a significant amount of data, which can be computationally intensive and architecturally complex. Efforts to address the challenge of handling large amounts of data in high-resolution scenarios have led to the development of techniques like data pruning and data diet approaches. We present a novel approach called Select Base on Intra-Class Similarity (SICS), distinguishes itself by measuring the similarity of samples within the same class and identifies the most informative samples that are most dissimilar from others, and introducing the novel concept of a distinctive-variant sample, vital for enhancing deep-learning classification tasks. We evaluated our method on several image classification benchmarks and compared it with existing techniques. Our results show that in high-resolution images and many class scenarios, SICS can achieve the same level of accuracy as the full data while using only about 80% of the training data, outperforming the ForgettingScore method by 20% to 90%. Additionally, our method maintains its robustness when switching to different training models. Our source code is publicly available at https://github.com/ Gusicun/SICS.
What problem does this paper attempt to address?