Food Recognition with Visual Transformers

M. Buzzelli,Paolo Napoletano,Flavio Piccoli,Simone Bianco,Gaetano Chiriaco
DOI: https://doi.org/10.1109/ICCE-Berlin58801.2023.10375660
2023-09-03
Abstract:Food recognition is a major challenge in the field of computer vision, requiring models that can effectively handle the wide variability and complexity of food images. In this paper, we explore the use of vision transformers, a category of models based on self-attention mechanisms, to address the task of food recognition. We focus on training and fine-tuning different vision transformer architectures on Food2K, a large-scale dataset of food images with 2,000 categories. We compare the performance of vision transformers with convolutional neural networks (CNNs) on Food2K and Food101. In addition, we use state-of-the-art explainability techniques to highlight the regions of interest that vision transformers take into account when performing a prediction. Our results show that vision transformers can achieve competitive results on food recognition tasks, with the added benefit that pre-training on Food2K improve their generalization capabilities and interpretability. This study highlights the potential of vision transformers in food computing, paving the way for future research in this field.
Computer Science,Agricultural and Food Sciences
What problem does this paper attempt to address?