Rethinking Model Prototyping through the MedMNIST+ Dataset Collection

Sebastian Doerrich,Francesco Di Salvo,Julius Brockmann,Christian Ledig

2024-05-08

Abstract:The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically relevant innovations. In response, this work presents a comprehensive benchmark for the MedMNIST+ database to diversify the evaluation landscape and conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification. Our evaluation encompasses various medical datasets, training methodologies, and input resolutions, aiming to reassess the strengths and limitations of widely used model variants. Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches. Additionally, contrary to prevailing assumptions, we observe that higher resolutions may not consistently improve performance beyond a certain threshold, advocating for the use of lower resolutions, particularly in prototyping stages, to expedite processing. Notably, our analysis reaffirms the competitiveness of convolutional models compared to ViT-based architectures emphasizing the importance of comprehending the intrinsic capabilities of different model architectures. Moreover, we hope that our standardized evaluation framework will help enhance transparency, reproducibility, and comparability on the MedMNIST+ dataset collection as well as future research within the field. Code is available at

Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The paper primarily addresses the issues present in the development of deep learning models in the medical field by proposing a new evaluation benchmark and a series of experimental analyses. The aim is to improve aspects such as model prototype design, training strategies, and the choice of input resolution. Specifically, the paper attempts to solve the following key problems: 1. **Limited and heterogeneous medical datasets**: The application of deep learning systems in clinical practice is restricted mainly due to the small sample size and diverse sources of available datasets, which poses challenges to the generalization ability of supervised learning algorithms. 2. **Overemphasis on marginal performance improvements in benchmark tests**: Researchers tend to fine-tune existing methods to achieve the latest results on benchmark tests, neglecting clinical practicality. This trend leads to slow actual progress in algorithms. 3. **Choice of models and training schemes**: The paper re-evaluates the performance of common Convolutional Neural Networks (CNN) and Transformer-based architectures on medical image classification tasks and explores the effects of different training schemes (such as end-to-end training, linear probing, etc.). 4. **Impact of input resolution**: The paper also examines the impact of different input resolutions on model performance, particularly the importance of selecting an appropriate resolution during the prototype design phase to accelerate the processing. Through the above analyses, the main objectives of the paper include: - Providing a comprehensive benchmarking framework that covers various medical datasets, training methods, and input resolutions to promote a deeper understanding of the strengths and limitations of commonly used models. - Re-examining the common assumptions regarding model design, training strategies, and input resolution requirements. - Recommending best practices to be considered during model development and deployment to enhance transparency, reproducibility, and comparability. In summary, the paper aims to provide guidance and support for the development of deep learning models in the medical field by introducing the new benchmark collection MedMNIST+ and a series of detailed experimental results.

Rethinking Model Prototyping through the MedMNIST+ Dataset Collection

MedMNIST-C: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions

Deep neural models for automated multi-task diagnostic scan management—quality enhancement, view classification and report generation

MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation

MNet-10: A robust shallow convolutional neural network model performing ablation study on medical images assessing the effectiveness of applying optimal data augmentation technique

Prototype-based Interpretable Breast Cancer Prediction Models: Analysis and Challenges

MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging

Leveraging Pretrained Models for Multimodal Medical Image Interpretation: An Exhaustive Experimental Analysis

Augment like there's no tomorrow: Consistently performing neural networks for medical imaging

AI support for colonoscopy quality control using CNN and transformer architectures

Implementing vision transformer for classifying 2D biomedical images

MedMNIST v2 -- A large-scale lightweight benchmark for 2D and 3D biomedical image classification

A Unified Approach Addressing Class Imbalance in Magnetic Resonance Image for Deep Learning Models

Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval

Efficient human-in-loop deep learning model training with iterative refinement and statistical result validation

Improving the repeatability of deep learning models with Monte Carlo dropout

Time to Embrace Natural Language Processing (NLP)-based Digital Pathology: Benchmarking NLP- and Convolutional Neural Network-based Deep Learning Pipelines

nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation

Robust-Deep: A Method for Increasing Brain Imaging Datasets to Improve Deep Learning Models' Performance and Robustness

Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters

Enhancing Radiology Diagnosis through Convolutional Neural Networks for Computer Vision in Healthcare