ImageNot: A contrast with ImageNet preserves model rankings

Olawale Salaudeen,Moritz Hardt
2024-04-03
Abstract:We introduce ImageNot, a dataset designed to match the scale of ImageNet while differing drastically in other aspects. We show that key model architectures developed for ImageNet over the years rank identically when trained and evaluated on ImageNot to how they rank on ImageNet. This is true when training models from scratch or fine-tuning them. Moreover, the relative improvements of each model over earlier models strongly correlate in both datasets. We further give evidence that ImageNot has a similar utility as ImageNet for transfer learning purposes. Our work demonstrates a surprising degree of external validity in the relative performance of image classification models. This stands in contrast with absolute accuracy numbers that typically drop sharply even under small changes to a dataset.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main objective of this paper is to explore the performance stability and external validity of machine learning models on different datasets, particularly for image classification tasks. The authors achieve this by creating a new dataset named ImageNot, which matches the scale of the well-known ImageNet dataset but differs significantly in other aspects such as category selection and image sources. The main contributions of the paper can be summarized as follows: 1. **Creation of the ImageNot Dataset**: The ImageNot dataset contains 1,000 categories, with 1,000 images per category, mirroring the distribution of ImageNet's ILSVRC-2012. However, the categories in ImageNot are randomly selected from WordNet, unrelated to ImageNet categories, and not human-annotated. These categories are chosen based on image titles and category names from the LAION-5B dataset. 2. **Stability of Model Rankings**: The study finds that the rankings of key model architectures (such as AlexNet, VGG, ResNet, DenseNet, EfficientNet, and ConvNeXt) trained and evaluated on ImageNot are the same as those on ImageNet. Additionally, the degree of improvement of each model relative to earlier models shows a strong correlation across both datasets. 3. **Pre-training and Transfer Learning**: The paper also explores the utility of ImageNot in pre-training and transfer learning, finding that ImageNot has similar effects in this regard as ImageNet. Specifically, models pre-trained on ImageNot exhibit fine-tuning performance on the CIFAR-10 dataset comparable to those pre-trained on ImageNet. In summary, despite significant differences between ImageNot and ImageNet in several aspects, the performance rankings and relative improvements of models on ImageNot remain similar to those on ImageNet. This suggests that the choice of a specific dataset may not be a decisive factor for the development of machine learning models, and the external validity of models may be higher than previously thought.