Food Image Recognition via Multi-scale Jigsaw and Reconstruction Network

LIU Yu-Xin,MIN Wei-Qing,JIANG Shu-Qiang,RUI Yong
DOI: https://doi.org/10.13328/j.cnki.jos.006325
2022-01-01
Journal of Software
Abstract:Recently, food image recognition has received more and more attention for its wide applications in healthy diet management, smart restaurant and so on. Unlike other object recognition tasks, food images belong to fine-grained ones with high intra-class variability and inter-class similarity. Furthermore, food images don’t have fixed semantic patterns and specific spatial layout. These make food recognition more challenging. In this paper, we propose a Multi-scale Jigsaw and Reconstruction Network (MJR-Net) for food recognition. MJR-Net is composed of three parts. The jigsaw and reconstruction module uses a method called Destruction and Reconstruction Learning (DCL) to destroy and reconstruct the original image to extract local discriminative details. Feature pyramid module can fuse mid-level  基金项目: 国家自然科学基金(61972378, U1936203, U19B2040) Foundation item: National Natural Science Foundation of China (61972378, U1936203, U19B2040) 2 Journal of Software 软件学报 features of different sizes to capture multi-scale local discriminative features. Channel-wise Attention Module can model the importance of different feature channels to enhance the discriminative visual patterns and weaken the noise patterns. The paper also uses both A-softmax loss and Focal loss to optimize the network by increasing the inter-class variability and reweighting samples respectively. We evaluate MJRNet on three food datasets (ETH Food-101, Vireo Food-172 and ISIA Food-500). Our method achieves 90.82%, 91.37%, and 64.95% accuracy, respectively. The experimental results show that, compared with other food recognition methods, MJR-Net shows greater competitiveness and especially achieves the state-of-the-art recognition performance on Vireo Food-172 and ISIA Food-500. Comprehensive ablation studies and visual analysis also prove the effectiveness of our method.
What problem does this paper attempt to address?