A Multi-Group Multi-Stream attribute Attention network for fine-grained zero-shot learning

Lingyun Song,Xuequn Shang,Ruizhi Zhou,Jun Liu,Jie Ma,Zhanhuai Li,Mingxuan Sun
DOI: https://doi.org/10.1016/j.neunet.2024.106558
Abstract:Fine-grained visual categorization in zero-shot setting is a challenging problem in the computer vision community. It requires algorithms to accurately identify fine-grained categories that do not appear during the training phase and have high visual similarity to each other. Existing methods usually address this problem by using attribute information as intermediate knowledge, which provides sufficient fine-grained characteristics of categories and can be transferred from seen categories to unseen categories. However, the learning of attribute visual features is not trivial due to the following two reasons: (i) The visual information about attributes of different types may interfere with the visual feature learning of each other. (ii) The visual characteristics of the same attribute may vary in different categories. To solve these issues, we propose a Multi-Group Multi-Stream attribute Attention network (MGMSA), which not only separates the feature learning of attributes of different types, but also isolates the learning of attribute visual features for categories with big differences in attribute appearance. This avoids the interference between uncorrelated attributes and helps to learn category-specific attribute-related visual features. This is beneficial for distinguishing fine-grained categories with subtle visual differences. Extensive experiments on benchmark datasets show that MGMSA achieves state-of-the-art performance on attribute prediction and fine-grained zero-shot learning.
What problem does this paper attempt to address?