Few-Shot Fine-Grained Image Classification via Multi-Frequency Neighborhood and Double-Cross Modulation
Hegui Zhu,Zhan Gao,Jiayi Wang,Yange Zhou,Chengqing Li
DOI: https://doi.org/10.1109/tmm.2024.3405713
IF: 7.3
2024-10-19
IEEE Transactions on Multimedia
Abstract:Traditional fine-grained image classification typically relies on large-scale training samples with annotated ground truth. However, some fine-grained categories in the real world have few available images, and the existing few-shot models have difficulty in distinguishing the subtle differences among them. Moreover, the intra-class distances between some fine-grained categories may be very large, but the inter-class ones are small, which makes the distinguishing features of each category are different for distinct tasks. To solve the challenges, we propose a novel network (FicNet) using multi-frequency neighborhood (MFN) and double-cross modulation (DCM). MFN captures the multi-frequency structure representation that is irrelevant to the background by integrating the spatial and frequency domain information, and then reduces the intra-class distance. DCM modulates the representation by global context and inter-class relationships, which enables both support and query features to have complete targets and respond to the same parts, and then accurately identify subtle inter-class differences. Comprehensive experiments on three fine-grained benchmark datasets for two few-shot tasks verify that FicNet has excellent performance compared to the state-of-the-art methods. Notably, it can obtain classification accuracy 93.17% and 95.36% on datasets "Caltech-UCSD Birds" and "Stanford Cars", respectively, surpassing the benchmarks set by general fine-grained image classification methods.
computer science, information systems,telecommunications, software engineering