Learning Mutually Exclusive Part Representations for Fine-Grained Image Classification

Chuanming Wang,Huiyuan Fu,Huadong Ma
DOI: https://doi.org/10.1109/tmm.2023.3307235
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Fine-grained image classification (FGIC) aims to separate different subcategories from one general superclass, which requires the classification model to extract distinctive representations from subtle yet discriminative regions of the objects. Learning multiple part representations can give a detailed description of the object from different perspectives, boosting the classification performance. However, it still remains a challenging problem to effectively locate diverse parts and extract their features without the assistance of part annotations. In this paper, we present a novel method to achieve accurate fine-grained image classification by learning a set of diverse and discriminative part representations without requiring additional supervision. Firstly, our method utilizes a simple attention interaction module to lead learned spatial attentions to focus on different parts, resulting in mutually exclusive part representations. Then, to reduce the impairment of channel coupling among part representations, a part-wise channel weighting module is designed to adjust the amplitudes of different representations adaptively, making them to be diverse along the channel dimension. Moreover, to ensure comprehensive and sufficient part representations, our method introduces multi-granularity feature learning. It enables the extraction of part representations from different semantic and content levels, capturing fine-grained details effectively. To evaluate our method, extensive experiments are conducted on various benchmark fine-grained image datasets, and the results show that our method can achieve outstanding performance for FGIC, demonstrating its effectiveness.
What problem does this paper attempt to address?