Abstract:Purpose: To perform an in-depth evaluation of current state of the art techniques in training neural networks to identify appropriate approaches in small datasets. Method: In total, 112,120 frontal-view X-ray images from the NIH ChestXray14 dataset were used in our analysis. Two tasks were studied: unbalanced multi-label classification of 14 diseases, and binary classification of pneumonia vs non-pneumonia. All datasets were randomly split into training, validation, and testing (70%, 10%, and 20%). Two popular convolution neural networks (CNNs), DensNet121 and ResNet50, were trained using PyTorch. We performed several experiments to test: (a) whether transfer learning using pretrained networks on ImageNet are of value to medical imaging/physics tasks (e.g., predicting toxicity from radiographic images after training on images from the internet), (b) whether using pretrained networks trained on problems that are similar to the target task helps transfer learning (e.g., using X-ray pretrained networks for X-ray target tasks), (c) whether freeze deep layers or change all weights provides an optimal transfer learning strategy, (d) the best strategy for the learning rate policy, and (e) what quantity of data is needed in order to appropriately deploy these various strategies (N = 50 to N = 77 880). Results: In the multi-label problem, DensNet121 needed at least 1600 patients to be comparable to, and 10 000 to outperform, radiomics-based logistic regression. In classifying pneumonia vs non-pneumonia, both CNN and radiomics-based methods performed poorly when N < 2000. For small datasets ( < 2000), however, a significant boost in performance (>15% increase on AUC) comes from a good selection of the transfer learning dataset, dropout, cycling learning rate, and freezing and unfreezing of deep layers as training progresses. In contrast, if sufficient data are available (>35 000), little or no tweaking is needed to obtain impressive performance. While transfer learning using X-ray images from other anatomical sites improves performance, we also observed a similar boost by using pretrained networks from ImageNet. Having source images from the same anatomical site, however, outperforms every other methodology, by up to 15%. In this case, DL models can be trained with as little as N = 50. Conclusions: While training DL models in small datasets (N < 2000) is challenging, no tweaking is necessary for bigger datasets (N > 35 000). Using transfer learning with images from the same anatomical site can yield remarkable performance in new tasks with as few as N = 50. Surprisingly, we did not find any advantage for using images from other anatomical sites over networks that have been trained using ImageNet. This indicates that features learned may not be as general as currently believed, and performance decays rapidly even by just changing the anatomical site of the images.

An analysis of the effects of limited training data in distributed learning scenarios for brain age prediction

Foundation model-driven distributed learning for enhanced retinal age prediction

Optimising brain age estimation through transfer learning: A suite of pre‐trained foundation models for improved performance and generalisability in a clinical setting

Brain Age Prediction: Deep Models Need a Hand to Generalize

Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience

The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study

Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias

Estimating brain age based on a healthy population with deep learning and structural MRI

A deep learning model for brain age prediction using minimally preprocessed T1w images as input

Transfer Learning Models on Brain Age Prediction

A federated learning architecture for secure and private neuroimaging analysis

Efficient federated learning for distributed neuroimaging data

Predicting Age from White Matter Diffusivity with Residual Learning

Distributed Federated Learning-Based Deep Learning Model for Privacy MRI Brain Tumor Detection

Brain age prediction: A comparison between machine learning models using region‐ and voxel‐based morphometric data

Cross-Age and Cross-Site Domain Shift Impacts on Deep Learning-Based White Matter Fiber Estimation in Newborn and Baby Brains

Aggregate and transfer knowledge of functional connectivity of brain for detecting autism spectrum disorder for multi-site research

Feasibility of Federated Learning from Client Databases with Different Brain Diseases and MRI Modalities

Transfer Learning with intelligent training data selection for prediction of Alzheimer's Disease

Targeted transfer learning to improve performance in small medical physics datasets