Abstract:The idea of representing images using a bag of visual words is currently popular in object category recognition. Since this representation is typically constructed using unsupervised clustering, the resulting visual words may not capture the desired information. Recent work has explored the construction of discriminative visual codebooks that explicitly consider object category information. However, since the codebook generation process is still disconnected from that of classifier training, the set of resulting visual words, while individually discriminative, may not be those best suited for the classifier This paper proposes a novel optimization framework that unifies codebook generation with classifier training. In our approach, each image feature is encoded by a sequence of "visual bits" optimized for each category. An image, which can contain objects from multiple categories, is represented using aggregates of visual bits for each category. Classifiers associated with different categories determine how well a given image corresponds to each category. Based on the performance of these classifiers on the training data, we augment the visual words by generating additional bits. The classifiers are then updated to incorporate the new representation. These two phases are repeated until the desired performance is achieved. Experiments compare our approach to standard clustering-based methods and with state-of-the-art discriminative visual codebook generation. The significant improvements over previous techniques clearly demonstrate the value of unifying representation and classification into a single optimization framework.

PartBook for Image Parsing

Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition.

Image Classification Method by Combining Multi-features and Sparse Coding

Sparse Codebook Model of Local Structures for Retrieval of Focal Liver Lesions Using Multiphase Medical Images

Object Recognition Based on the Region of Interest and Optimal Bag of Words Model.

ObjectBook Construction for Large-Scale Semantic-Aware Image Retrieval

Image Parsing: Unifying Segmentation, Detection, and Recognition.

A Fast Algorithm For Creating A Compact And Discriminative Visual Codebook

Automatic Discovery and Optimization of Parts for Image Classification

Unifying Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition

Parsing Objects at a Finer Granularity: A Survey

Multi-Class Part Parsing with Joint Boundary-Semantic Awareness.

Selective Parts For Fine-Grained Recognition

Visual word coding based on difference maximization.

Unified Perceptual Parsing for Scene Understanding

Codebook Enhancement of Vlad Representation for Visual Recognition.

A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities

OV-PARTS: Towards Open-Vocabulary Part Segmentation

Semantics-Preserving Bag-of-Words Models and Applications

Going Denser with Open-Vocabulary Part Segmentation

Image Classification by Codebook Updating via Joint i-Pat Topic Model Feedback Wai