Abstract:Purpose: To perform an in-depth evaluation of current state of the art techniques in training neural networks to identify appropriate approaches in small datasets. Method: In total, 112,120 frontal-view X-ray images from the NIH ChestXray14 dataset were used in our analysis. Two tasks were studied: unbalanced multi-label classification of 14 diseases, and binary classification of pneumonia vs non-pneumonia. All datasets were randomly split into training, validation, and testing (70%, 10%, and 20%). Two popular convolution neural networks (CNNs), DensNet121 and ResNet50, were trained using PyTorch. We performed several experiments to test: (a) whether transfer learning using pretrained networks on ImageNet are of value to medical imaging/physics tasks (e.g., predicting toxicity from radiographic images after training on images from the internet), (b) whether using pretrained networks trained on problems that are similar to the target task helps transfer learning (e.g., using X-ray pretrained networks for X-ray target tasks), (c) whether freeze deep layers or change all weights provides an optimal transfer learning strategy, (d) the best strategy for the learning rate policy, and (e) what quantity of data is needed in order to appropriately deploy these various strategies (N = 50 to N = 77 880). Results: In the multi-label problem, DensNet121 needed at least 1600 patients to be comparable to, and 10 000 to outperform, radiomics-based logistic regression. In classifying pneumonia vs non-pneumonia, both CNN and radiomics-based methods performed poorly when N < 2000. For small datasets ( < 2000), however, a significant boost in performance (>15% increase on AUC) comes from a good selection of the transfer learning dataset, dropout, cycling learning rate, and freezing and unfreezing of deep layers as training progresses. In contrast, if sufficient data are available (>35 000), little or no tweaking is needed to obtain impressive performance. While transfer learning using X-ray images from other anatomical sites improves performance, we also observed a similar boost by using pretrained networks from ImageNet. Having source images from the same anatomical site, however, outperforms every other methodology, by up to 15%. In this case, DL models can be trained with as little as N = 50. Conclusions: While training DL models in small datasets (N < 2000) is challenging, no tweaking is necessary for bigger datasets (N > 35 000). Using transfer learning with images from the same anatomical site can yield remarkable performance in new tasks with as few as N = 50. Surprisingly, we did not find any advantage for using images from other anatomical sites over networks that have been trained using ImageNet. This indicates that features learned may not be as general as currently believed, and performance decays rapidly even by just changing the anatomical site of the images.

Evaluation of the impact of deep learning architectural components selection and dataset size on a medical imaging task

Toward understanding deep learning classification of anatomic sites: lessons from the development of a CBCT projection classifier

Disease Classification and Impact of Pretrained Deep Convolution Neural Networks on Diverse Medical Imaging Datasets across Imaging Modalities

Deep neural models for automated multi-task diagnostic scan management—quality enhancement, view classification and report generation

Targeted transfer learning to improve performance in small medical physics datasets

Comprehensive Comparison of Deep Learning Models for Lung and COVID-19 Lesion Segmentation in CT scans

Challenges of building medical image datasets for development of deep learning software in stroke

Effect of Training Data Volume on Performance of Convolutional Neural Network Pneumothorax Classifiers

Performance of a Deep Neural Network Algorithm Based on a Small Medical Image Dataset: Incremental Impact of 3D-to-2D Reformation Combined with Novel Data Augmentation, Photometric Conversion, or Transfer Learning

The Impact of Scanner Domain Shift on Deep Learning Performance in Medical Imaging: an Experimental Study

Robust application of new deep learning tools: an experimental study in medical imaging

Impact of Image Resolution on Deep Learning Performance in Endoscopy Image Classification: An Experimental Study Using a Large Dataset of Endoscopic Images

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Using deep learning techniques in medical imaging: a systematic review of applications on CT and PET

Trends in the application of deep learning networks in medical image analysis: Evolution between 2012 and 2020

Medical image analysis using deep learning algorithms

Robust-Deep: A Method for Increasing Brain Imaging Datasets to Improve Deep Learning Models' Performance and Robustness

Are Deep Learning Classification Results Obtained on CT Scans Fair and Interpretable?

Deep Learning Body Region Classification of MRI and CT examinations

Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance