Learning from One and Only One Shot

Haizi Yu,Igor Mineyev,Lav R. Varshney,James A. Evans
2024-05-21
Abstract:Humans can generalize from only a few examples and from little pretraining on similar tasks. Yet, machine learning (ML) typically requires large data to learn or pre-learn to transfer. Motivated by nativism and artificial general intelligence, we directly model human-innate priors in abstract visual tasks such as character and doodle recognition. This yields a white-box model that learns general-appearance similarity by mimicking how humans naturally ``distort'' an object at first sight. Using just nearest-neighbor classification on this cognitively-inspired similarity space, we achieve human-level recognition with only $1$--$10$ examples per class and no pretraining. This differs from few-shot learning that uses massive pretraining. In the tiny-data regime of MNIST, EMNIST, Omniglot, and QuickDraw benchmarks, we outperform both modern neural networks and classical ML. For unsupervised learning, by learning the non-Euclidean, general-appearance similarity space in a $k$-means style, we achieve multifarious visual realizations of abstract concepts by generating human-intuitive archetypes as cluster centroids.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem discussed in this paper is how machine learning can effectively learn and generalize from a very small amount of samples, especially without pre-training. Inspired by human ability to learn from few examples and quickly adapt to new tasks, a white-box model based on "distorted canvas" is proposed. This model imitates how humans naturally "distort" objects to recognize similarity, thus learning general appearance similarity. By using only nearest neighbor classification in this cognitive heuristic similarity space, the model achieves human-level recognition performance with only 1-10 samples per category, without pre-training, unlike the use of large-scale pre-training in few-shot learning methods. In data-scarce scenarios such as MNIST, EMNIST, Omniglot, and QuickDraw benchmark tests, this model outperforms modern neural networks and classical machine learning methods. In unsupervised learning, by learning a non-Euclidean, general appearance similarity space, the model is able to generate intuitive prototypes of abstract concepts as cluster centers, achieving diverse visual representations. The challenges proposed in the paper include parameterizing all transformations, simulating human hierarchical abstraction ability, and avoiding local minima in the optimization process. These problems are addressed by a method called Abstract Multi-Level Gradient Descent (AMGD), which makes the entire optimization process interpretable. Ultimately, the model performs well in abstract visual tasks such as character and doodle recognition, especially in scenarios with extremely small data or single samples, approaching human-level performance.