Abstract:Humans can generalize from only a few examples and from little pretraining on similar tasks. Yet, machine learning (ML) typically requires large data to learn or pre-learn to transfer. Motivated by nativism and artificial general intelligence, we directly model human-innate priors in abstract visual tasks such as character and doodle recognition. This yields a white-box model that learns general-appearance similarity by mimicking how humans naturally ``distort'' an object at first sight. Using just nearest-neighbor classification on this cognitively-inspired similarity space, we achieve human-level recognition with only $1$--$10$ examples per class and no pretraining. This differs from few-shot learning that uses massive pretraining. In the tiny-data regime of MNIST, EMNIST, Omniglot, and QuickDraw benchmarks, we outperform both modern neural networks and classical ML. For unsupervised learning, by learning the non-Euclidean, general-appearance similarity space in a $k$-means style, we achieve multifarious visual realizations of abstract concepts by generating human-intuitive archetypes as cluster centroids.

What problem does this paper attempt to address?

The problem discussed in this paper is how machine learning can effectively learn and generalize from a very small amount of samples, especially without pre-training. Inspired by human ability to learn from few examples and quickly adapt to new tasks, a white-box model based on "distorted canvas" is proposed. This model imitates how humans naturally "distort" objects to recognize similarity, thus learning general appearance similarity. By using only nearest neighbor classification in this cognitive heuristic similarity space, the model achieves human-level recognition performance with only 1-10 samples per category, without pre-training, unlike the use of large-scale pre-training in few-shot learning methods. In data-scarce scenarios such as MNIST, EMNIST, Omniglot, and QuickDraw benchmark tests, this model outperforms modern neural networks and classical machine learning methods. In unsupervised learning, by learning a non-Euclidean, general appearance similarity space, the model is able to generate intuitive prototypes of abstract concepts as cluster centers, achieving diverse visual representations. The challenges proposed in the paper include parameterizing all transformations, simulating human hierarchical abstraction ability, and avoiding local minima in the optimization process. These problems are addressed by a method called Abstract Multi-Level Gradient Descent (AMGD), which makes the entire optimization process interpretable. Ultimately, the model performs well in abstract visual tasks such as character and doodle recognition, especially in scenarios with extremely small data or single samples, approaching human-level performance.

Learning from One and Only One Shot

Leveraging Prior Concept Learning Improves Generalization From Few Examples in Computational Models of Human Object Recognition

A Theory of Human-Like Few-Shot Learning

Sample-Efficient Learning of Novel Visual Concepts

Abstracted Gaussian Prototypes for One-Shot Concept Learning

Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks

Few-shot Learning for Multi-Modality Tasks

Bayesian Inverse Graphics for Few-Shot Concept Learning

One-Shot Face Recognition Based on Multiple Classifiers Training

Human-level concept learning through probabilistic program induction

Unsupervised One-shot Learning of Both Specific Instances and Generalised Classes with a Hippocampal Architecture

N-Omniglot, a large-scale neuromorphic dataset for spatio-temporal sparse few-shot learning

Multi-Level Semantic Feature Augmentation for One-Shot Learning

Semantic-Based Few-Shot Learning by Interactive Psychometric Testing

A single fast Hebbian-like process enabling one-shot class addition in deep neural networks without backbone modification

One-Shot Visual Imitation Learning via Meta-Learning

Synthetic Examples Improve Generalization for Rare Classes

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches

Meta Learning for Few-Shot One-class Classification

Less is More: A Closer Look at Semantic-based Few-Shot Learning