Fused One-vs-All Features with Semantic Alignments for Fine-Grained Visual Categorization

Xiaopeng Zhang,Hongkai Xiong,Wengang Zhou,Qi Tian
DOI: https://doi.org/10.1109/tip.2015.2509425
IF: 10.6
2016-01-01
IEEE Transactions on Image Processing
Abstract:Fine-grained visual categorization is an emerging research area and has been attracting growing attention recently. Due to the large inter-class similarity and intra-class variance, it is extremely challenging to recognize objects in fine-grained domains. A traditional spatial pyramid matching model could obtain desirable results for the basic-level category classification by weak alignment, but may easily fail in fine-grained domains, since the discriminative features are extremely localized. This paper proposes a new framework for fine-grained visual categorization. First, an efficient part localization method incorporates semantic prior into geometric alignment. It detects the less deformable parts, such as the head of birds with a template-based model, and localizes other highly deformable parts with simple geometric alignment. Second, we learn one-vs-all features, which are simple and transplantable. The learned mid-level features are dimension friendly and more robust to outlier instances. Furthermore, in view that some subcategories are too similar to tell them apart easily, we fuse the subcategories iteratively according to their similarities, and learn fused one-vs-all features. Experimental results show the superior performance of our algorithms over the existing methods.
What problem does this paper attempt to address?