Navigating Data Scarcity using Foundation Models: A Benchmark of Few-Shot and Zero-Shot Learning Approaches in Medical Imaging

Stefano Woerner,Christian F. Baumgartner
2024-08-15
Abstract:Data scarcity is a major limiting factor for applying modern machine learning techniques to clinical tasks. Although sufficient data exists for some well-studied medical tasks, there remains a long tail of clinically relevant tasks with poor data availability. Recently, numerous foundation models have demonstrated high suitability for few-shot learning (FSL) and zero-shot learning (ZSL), potentially making them more accessible to practitioners. However, it remains unclear which foundation model performs best on FSL medical image analysis tasks and what the optimal methods are for learning from limited data. We conducted a comprehensive benchmark study of ZSL and FSL using 16 pretrained foundation models on 19 diverse medical imaging datasets. Our results indicate that BiomedCLIP, a model pretrained exclusively on medical data, performs best on average for very small training set sizes, while very large CLIP models pretrained on LAION-2B perform best with slightly more training samples. However, simply fine-tuning a ResNet-18 pretrained on ImageNet performs similarly with more than five training examples per class. Our findings also highlight the need for further research on foundation models specifically tailored for medical applications and the collection of more datasets to train these models.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively use pre - trained foundation models for few - shot learning (FSL) and zero - shot learning (ZSL) in the case of scarce medical imaging data. Specifically, the paper aims to explore which foundation models perform best when data is very limited by comparing the FSL and ZSL performance of different pre - trained models on various medical imaging tasks, and to study the optimal learning strategies. ### Background and Motivation 1. **Data Scarcity Problem**: The application of modern machine - learning techniques in clinical tasks is severely restricted by data scarcity. Although there is sufficient data for some common medical tasks, many clinically relevant tasks lack sufficient data due to difficulties in data collection. 2. **Application Potential of Foundation Models**: In recent years, many foundation models have shown high applicability in FSL and ZSL tasks, which may make these tasks more feasible for practitioners. However, it is still unclear which foundation model performs best in FSL tasks in medical imaging analysis and the best way to learn from limited data. ### Research Objectives 1. **Benchmarking**: Evaluate the performance of 16 pre - trained foundation models in FSL and ZSL tasks by conducting extensive benchmarking on 19 different medical imaging datasets. 2. **Performance Comparison**: Determine which foundation models perform well under different numbers of training samples. 3. **Method Exploration**: Explore the effects of two adaptation strategies, linear probing and fine - tuning, on different models. ### Main Findings 1. **Performance of BiomedCLIP**: For very small training sets (less than 5 samples per category), BiomedCLIP (a model pre - trained specifically on medical data) performs best. 2. **Advantages of CLIP Models**: When the number of training samples increases slightly, large CLIP models (such as CLIP - ViT - H) show better performance. 3. **Practicality of ResNet - 18**: In the case of more than 5 training samples per category, simple ResNet - 18 fine - tuning can also achieve similar results. 4. **Relationship between Model Complexity and the Amount of Pre - trained Data**: The size of the model and the scale of the pre - trained dataset are positively correlated with FSL performance. 5. **Limitations of ZSL**: ZSL methods perform far worse than FSL methods in medical imaging tasks. ### Conclusions 1. **Optimal Strategies in the Case of Scarce Data**: In the case of very little data, using BiomedCLIP for linear probing is the best choice; as the amount of data increases, the linear probing of the CLIP - ViT - H model performs better. 2. **Model Selection and Adaptation Strategies**: Although ResNet - 18 fine - tuning performs well when there is more data, in most cases, linear probing using large foundation models is still a better choice. 3. **Future Research Directions**: There is a need for further research on foundation models specifically designed for medical applications and to collect more data to train these models. Through these studies, the paper provides practical guidance and suggestions for researchers in the medical imaging field, helping them use modern machine - learning techniques more effectively in the case of scarce data.