Abstract:Fine-grained visual categorization is to recognize hundreds of subcategories belonging to the same basic-level category, which is a highly challenging task due to the quite subtle and local visual distinctions among similar subcategories. Most existing methods generally learn part detectors to discover discriminative regions for better categorization performance. However, not all parts are beneficial and indispensable for visual categorization, and the setting of part detector number heavily relies on prior knowledge as well as experimental validation. As is known to all, when we describe the object of an image via textual descriptions, we mainly focus on the pivotal characteristics, and rarely pay attention to common characteristics as well as the background areas. This is an involuntary transfer from human visual attention to textual attention, which leads to the fact that textual attention tells us how many and which parts are discriminative and significant to categorization. So textual attention could help us to discover visual attention in image. Inspired by this, we propose a fine-grained visual-textual representation learning (VTRL) approach, and its main contributions are: (1) Fine-grained visual-textual pattern mining devotes to discovering discriminative visual-textual pairwise information for boosting categorization performance through jointly modeling vision and text with generative adversarial networks (GANs), which automatically and adaptively discovers discriminative parts. (2) Visual-textual representation learning jointly combines visual and textual information, which preserves the intra-modality and intermodality information to generate complementary fine-grained representation, as well as further improves categorization performance. Comprehensive experimental results on the widely-used CUB-200-2011 and Oxford Flowers-102 datasets demonstrate the effectiveness of our VTRL approach, which achieves the best categorization accuracy compared with state-of-the-art methods.

Exploiting Textual and Visual Features for Image Categorization

Fine-Grained Visual Categorization With Fine-Tuned Segmentation

A Pca Based Automatic Image Categorization Approach Using Dominant Color Features

Text Categorization Based on Domain Ontology

Aggressive Dimensionality Reduction With Reinforcement Local Feature Selection For Text Categorization

Fast text categorization based on collaborative work in the semantic and class spaces

Feature selection for image categorization

Enhancing Robust Text Classification via Category Description

Dimensionality Reduction With Category Information Fusion And Non-Negative Matrix Factorization For Text Categorization

Exploiting Multi-Context Analysis in Semantic Image Classification

Weakly-Supervised Semantic Segmentation via Sub-category Exploration

Webly-Supervised Fine-Grained Visual Categorization Via Deep Domain Adaptation.

Collaborative Work with Linear Classifier and Extreme Learning Machine for Fast Text Categorization

Non-Negative Sparse Semantic Coding for Text Categorization

Small variance among different subcategories Artic Tern Caspian Tern Large variance in the same subcategory Common Tern Salty Backed Gull Salty Backed Gull Salty Backed Gull

Salient Feature Selection for Visual Concept Learning

Learning Effective Features for Chinese Text Categorization

Exploiting Web Images for Fine-Grained Visual Recognition by Eliminating Open-Set Noise and Utilizing Hard Examples

Multiple-instance Learning for Text Categorization Based on Semantic Representation

Refining local descriptors by embedding semantic information for visual categorization.

Weakly Supervised Fine-Grained Categorization With Part-Based Image Representation