A Noise-robust Locality Transformer for Fine-grained Food Image Retrieval.

Jiajun Song,Weiqing Min,Yuxin Liu,Zhuo Li,Shuqiang Jiang,Yong Rui
DOI: https://doi.org/10.1109/mipr54900.2022.00068
2022-01-01
Abstract:Food image retrieval has wide applications in the multimedia community. However, there are two main challenges for food image retrieval. First, food images are often disturbed by food-irrelevant information such as plates and side dishes. Second, fine-grained characteristics of food images make visual representation of different categories similar. To solve them, we propose the Noiserobust Locality Transformer (NoLoTransformer) for food image retrieval under the metric learning-based retrieval framework. Specifically, we propose two novel modules, named Patch Attention Module (PAM) and Local Perception Unit (LPU) for the Transformer-based feature extraction. PAM weakens the negative impact of the noise in the food image by reweighting different patches and distributing low weights to noisy patches adaptively. LPU extracts local features by introducing convolution and then obtains the fine-grained information in the local feature. Extensive evaluation on three datasets demonstrates the effectiveness of the proposed method. Code is available at https://github.com/jiajun-ISIA/NoLoTransformer
What problem does this paper attempt to address?