Abstract:In this paper, a structurally enhanced incremental neural learning technique is proposed to learn a discriminative codebook representation of images for effective image classification applications. In order to accommodate the relationships such as structures and distributions among visual words into the codebook learning process, we develop an online codebook graph learning method based on a novel structurally enhanced incremental learning technique, called as "visualization-induced self-organized incremental neural network (ViSOINN)". The hidden structural information in the images is embedded into the graph representation evolving dynamically with the adaptive and competitive learning mechanism. Afterwards, image features can be coded using a sub-graph extraction process based on the learned codebook graph, and a classifier is subsequently used to complete the image classification task. Compared with other codebook learning algorithms originated from the classical Bag-of-Features (BoF) model, ViSOINN holds the following advantages: (1) it learns codebook efficiently and effectively from a small training set; (2) it models the relationships among visual words in metric scaling fashion, so preserving high discriminative power; (3) it automatically learns the codebook without a fixed pre-defined size; and (4) it enhances and preserves better the structure of the data. These characteristics help to improve image classification performance and make it more suitable for handling large-scale image classification tasks. Experimental results on the widely used Caltech-101 and Caltech-256 benchmark datasets demonstrate that ViSOINN achieves markedly improved performance and reduces the computational cost considerably.

Structured Label Inference for Visual Understanding.

Learning Structured Inference Neural Networks with Label Relations

Hierarchical Gate Network for Fine-Grained Visual Recognition.

Learning to Infer Unseen Single-/ Multi-Attribute-Object Compositions with Graph Networks.

DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding

Deep Dependency Networks and Advanced Inference Schemes for Multi-Label Classification

Visual Semantic Role Labeling for Video Understanding

Discovering Visual Concept Structure with Sparse and Incomplete Tags

A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM

Single Image 3D Interpreter Network

Towards Label-free Scene Understanding by Vision Foundation Models

Structure Inference Machines: Recurrent Neural Networks for Analyzing Relations in Group Activity Recognition

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

3D Interpreter Networks for Viewer-Centered Wireframe Modeling

Enhanced Lstm For Natural Language Inference

Semantic Representation and Inference for NLP

Towards Neuro-Symbolic Video Understanding

Variational Structured Attention Networks for Deep Visual Representation Learning

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

Structurally Enhanced Incremental Neural Learning for Image Classification with Subgraph Extraction