Abstract:With the tremendous advances made by Convolutional Neural Networks (ConvNets) on object recognition, we can now easily obtain adequately reliable machine-labeled annotations easily from predictions by off-the-shelf ConvNets. In this work, we present an "abstraction memory" based framework for few-shot learning, building upon machinelabeled image annotations. Our method takes large-scale machine-annotated dataset (e.g., OpenImages) as an external memory bank. In the external memory bank, the information is stored in the memory slots in the form of keyvalue, in which image feature is regarded as the key and the label embedding serves as the value. When queried by the few-shot examples, our model selects visually similar data from the external memory bank and writes the useful information obtained from related external data into another memory bank, i.e. abstraction memory. Long Short-Term Memory (LSTM) controllers and attention mechanisms are utilized to guarantee the data written to the abstraction memory correlates with the query example. The abstraction memory concentrates information from the external memory bank to make the few-shot recognition effective. In the experiments, we first confirm that our model can learn to conduct few-shot object recognition on clean humanlabeled data from the ImageNet dataset. Then, we demonstrate that with our model, machine-labeled image annotations are very effective and abundant resources for performing object recognition on novel categories. Experimental results show that our proposed model with machine-labeled annotations achieves great results, with only a 1% difference in accuracy between the machine-labeled annotations and the human-labeled annotations.

Label Independent Memory for Semi-Supervised Few-shot Video Classification

Compound Memory Networks for Few-Shot Video Classification

Few-Shot Ensemble Learning for Video Classification with SlowFast Memory Networks

Few-Shot Incremental Learning for Label-to-Image Translation

Few-Shot Object Recognition from Machine-Labeled Web Images.

Few-shot Learning for Multi-label Intent Detection

Learning Implicit Temporal Alignment for Few-shot Video Classification

Memory transformation networks for weakly supervised visual classification

Memory-augmented Dense Predictive Coding for Video Representation Learning

Multimodal few-shot classification without attribute embedding

Few-shot activity recognition with cross-modal memory network

Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition

Semi-supervised multi-instance multi-label learning for video annotation task.

Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation

Weakly Supervised Multiclass Video Segmentation

Few-shot Class-Incremental Semantic Segmentation via Pseudo-Labeling and Knowledge Distillation

Short-Form Video Classification Based on Gate Shift Module and Semantic Embedding

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

Learning with Memory for Few-Shot Semantic Segmentation

Semantic-Based Few-Shot Learning by Interactive Psychometric Testing

Few-Shot Object Detection with Memory Contrastive Proposal Based on Semantic Priors