Abstract:We introduce ImageNot, a dataset designed to match the scale of ImageNet while differing drastically in other aspects. We show that key model architectures developed for ImageNet over the years rank identically when trained and evaluated on ImageNot to how they rank on ImageNet. This is true when training models from scratch or fine-tuning them. Moreover, the relative improvements of each model over earlier models strongly correlate in both datasets. We further give evidence that ImageNot has a similar utility as ImageNet for transfer learning purposes. Our work demonstrates a surprising degree of external validity in the relative performance of image classification models. This stands in contrast with absolute accuracy numbers that typically drop sharply even under small changes to a dataset.

What problem does this paper attempt to address?

The main objective of this paper is to explore the performance stability and external validity of machine learning models on different datasets, particularly for image classification tasks. The authors achieve this by creating a new dataset named ImageNot, which matches the scale of the well-known ImageNet dataset but differs significantly in other aspects such as category selection and image sources. The main contributions of the paper can be summarized as follows: 1. **Creation of the ImageNot Dataset**: The ImageNot dataset contains 1,000 categories, with 1,000 images per category, mirroring the distribution of ImageNet's ILSVRC-2012. However, the categories in ImageNot are randomly selected from WordNet, unrelated to ImageNet categories, and not human-annotated. These categories are chosen based on image titles and category names from the LAION-5B dataset. 2. **Stability of Model Rankings**: The study finds that the rankings of key model architectures (such as AlexNet, VGG, ResNet, DenseNet, EfficientNet, and ConvNeXt) trained and evaluated on ImageNot are the same as those on ImageNet. Additionally, the degree of improvement of each model relative to earlier models shows a strong correlation across both datasets. 3. **Pre-training and Transfer Learning**: The paper also explores the utility of ImageNot in pre-training and transfer learning, finding that ImageNot has similar effects in this regard as ImageNet. Specifically, models pre-trained on ImageNot exhibit fine-tuning performance on the CIFAR-10 dataset comparable to those pre-trained on ImageNet. In summary, despite significant differences between ImageNot and ImageNet in several aspects, the performance rankings and relative improvements of models on ImageNot remain similar to those on ImageNet. This suggests that the choice of a specific dataset may not be a decisive factor for the development of machine learning models, and the external validity of models may be higher than previously thought.

ImageNot: A contrast with ImageNet preserves model rankings

An Analysis of Scale Invariance in Object Detection - SNIP

What Makes ImageNet Look Unlike LAION

When hard negative sampling meets supervised contrastive learning

Do better ImageNet classifiers assess perceptual similarity better?

Is it enough to optimize CNN architectures on ImageNet?

Does progress on ImageNet transfer to real-world datasets?

Generalized Global Ranking-Aware Neural Architecture Ranker for Efficient Image Classifier Search

Do Better ImageNet Models Transfer Better... for Image Recommendation?

A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

What Should Not Be Contrastive in Contrastive Learning

Image Classification with Small Datasets: Overview and Benchmark

Diverse Imagenet Models Transfer Better

Establishing a stronger baseline for lightweight contrastive models

Manifestation of Image Contrast in Deep Networks

Are we done with ImageNet?

CoNe: Contrast Your Neighbours for Supervised Image Classification

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

Invertible ResNets for Inverse Imaging Problems: Competitive Performance with Provable Regularization Properties