Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification

Yuxin Hong,Xiao Zhang,Xin Zhang,Joey Tianyi Zhou
DOI: https://doi.org/10.1145/3664647.3681592
2024-09-02
Abstract:In the medical field, managing high-dimensional massive medical imaging data and performing reliable medical analysis from it is a critical challenge, especially in resource-limited environments such as remote medical facilities and mobile devices. This necessitates effective dataset compression techniques to reduce storage, transmission, and computational cost. However, existing coreset selection methods are primarily designed for natural image datasets, and exhibit doubtful effectiveness when applied to medical image datasets due to challenges such as intra-class variation and inter-class similarity. In this paper, we propose a novel coreset selection strategy termed as Evolution-aware VAriance (EVA), which captures the evolutionary process of model training through a dual-window approach and reflects the fluctuation of sample importance more precisely through variance measurement. Extensive experiments on medical image datasets demonstrate the effectiveness of our strategy over previous SOTA methods, especially at high compression rates. EVA achieves 98.27% accuracy with only 10% training data, compared to 97.20% for the full training set. None of the compared baseline methods can exceed Random at 5% selection rate, while EVA outperforms Random by 5.61%, showcasing its potential for efficient medical image analysis.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to efficiently compress data sets in resource - limited environments (such as remote medical institutions and mobile devices) in medical image classification, in order to reduce storage, transmission and computing costs while maintaining the accuracy and reliability of model training. Specifically, the existing core - set selection methods are mainly designed for natural image data sets and are not effective when applied to medical image data sets because medical image data sets are characterized by large intra - class differences and high inter - class similarities. Therefore, this paper proposes a new core - set selection strategy - Evolution - aware VAriance (EVA) to better address these challenges. ### Summary of main problems: 1. **Management of high - dimensional and large - scale medical imaging data**: Medical image data usually has high dimensions and large capacity, which requires a large amount of resources for storage and transmission. 2. **Reliable medical analysis**: Conducting reliable medical analysis in resource - limited environments (such as remote medical institutions and mobile devices) is a key challenge. 3. **Limitations of existing core - set selection methods**: - Existing core - set selection methods are mainly designed for natural image data sets, and their effectiveness on medical image data sets is doubtful. - The characteristics of medical image data sets (such as intra - class variation and inter - class similarity) make it difficult for traditional selection methods to be effectively applied. ### Proposed solutions: - **Evolution - aware VAriance (EVA)**: By introducing a two - window method to capture the evolution during the model training process and more accurately reflect the fluctuations in sample importance through variance measurement. - **Two - window method**: One window focuses on the early stage of training, and the other window focuses on the later stage of training, so as to more comprehensively evaluate the importance of samples in the entire training process. - **Variance measurement**: Within each window, the variance of the sample error vector is calculated to more finely evaluate the contribution of samples to model training. ### Experimental results: - Experiments on the OrganAMNIST and OrganSMNIST data sets show that EVA can still maintain high accuracy at a high compression rate. - For example, when only 10% of the training data is used, EVA achieves an accuracy rate of 98.27%, while the accuracy rate of the complete training set is 97.20%. - At a 5% selection rate, EVA is 5.61% higher than the random selection method, demonstrating its effectiveness at extremely low selection rates. Through these improvements, EVA can efficiently perform medical image classification in resource - limited environments, significantly improving the efficiency of data compression and model training.