Abstract:The precise recognition of food categories plays a pivotal role for intelligent health management, attracting significant research attention in recent years. Prominent benchmarks, such as Food-101 and VIREO Food-172, provide abundant food image resources that catalyze the prosperity of research in this field. Nevertheless, these datasets are well-curated from canteen scenarios and thus deviate from food appearances in daily life. This discrepancy poses great challenges in effectively transferring classifiers trained on these canteen datasets to broader daily-life scenarios encountered by humans. Toward this end, we present two new benchmarks, namely DailyFood-172 and DailyFood-16, specifically designed to curate food images from everyday meals. These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain. In addition, we also propose a simple yet effective baseline method named Multi-Cluster Reference Learning (MCRL) to tackle the aforementioned domain gap. MCRL is motivated by the observation that food images in daily-life scenarios exhibit greater intra-class appearance variance compared with those in well-curated benchmarks. Notably, MCRL can be seamlessly coupled with existing approaches, yielding non-trivial performance enhancements. We hope our new benchmarks can inspire the community to explore the transferability of food recognition models trained on well-curated datasets toward practical real-life applications.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address a key issue in the field of food recognition: how to effectively apply food classifiers trained on meticulously curated datasets (such as Food-101 and VIREO Food-172) to food image recognition in everyday life. Existing datasets are primarily collected from cafeteria scenes, where images are professionally cooked and photographed, significantly differing from food images taken in daily life. This discrepancy leads to a domain gap, causing existing food recognition models to perform poorly in real-world applications. To tackle this challenge, the authors propose two new benchmark datasets: DailyFood-172 and DailyFood-16, specifically designed to collect food images from daily life. Additionally, the authors introduce a simple yet effective baseline method—Multi-Cluster Reference Learning (MCRL)—to address the domain gap issue. MCRL improves the model's generalization ability in real-life scenarios by dynamically learning the distribution differences between target samples and multiple source clusters. ### Main Contributions 1. **Introduction of New Benchmark Datasets**: The authors constructed two high-quality daily food recognition benchmark datasets, DailyFood-172 and DailyFood-16, aiming to unlock the potential of transferring food recognition models trained on curated cafeteria food datasets to everyday life scenarios. 2. **Proposing Multi-Cluster Reference Learning (MCRL) Method**: MCRL comprehensively learns the domain gap by dynamically referencing multiple source clusters, addressing the "class ambiguity" problem and mitigating the negative impact of inaccurate pseudo-label predictions. 3. **Extensive Experimental Evaluation**: The authors extensively evaluated the transferability of existing methods on the proposed DailyFood-172 and DailyFood-16 datasets and combined MCRL with several state-of-the-art UDA methods, demonstrating its effectiveness. ### Background and Motivation Food recognition plays a crucial role in intelligent health management and has attracted significant research attention in recent years. Existing benchmark datasets like Food-101 and VIREO Food-172 provide rich food image resources, but these datasets are primarily collected from cafeteria scenes, significantly differing from food images in daily life. This discrepancy leads to a domain gap, making it challenging for existing food recognition models to perform well in real-world applications. ### Method Overview #### Dataset Construction 1. **DailyFood-172**: This dataset contains 172 categories of daily food images, crawled from the "Xiachufang" website, with each category containing 47 to 329 images, totaling 42,312 images. 2. **DailyFood-16**: This dataset contains 16 categories of daily food images, collected from the weight loss platform "Qiezi Health," with each category containing 20 to 413 images, totaling 1,695 images. #### Multi-Cluster Reference Learning (MCRL) 1. **Hard Selection**: For each target sample, select the top K categories with the highest probability as pseudo-labels, then minimize the distribution differences between the target sample and the source clusters corresponding to these categories. 2. **Soft Selection**: Consider the similarity between the target sample and the source categories, dynamically assigning weights to each target sample through a scoring mechanism to control the extent of domain transfer learning. ### Experimental Results The authors conducted extensive experiments on the DailyFood-172 and DailyFood-16 datasets, showing that the MCRL method significantly improves the performance of existing UDA methods, especially in addressing the "class ambiguity" problem. ### Conclusion By introducing new benchmark datasets and proposing the MCRL method, the authors successfully addressed the challenge of applying food recognition models trained on curated cafeteria food datasets to everyday life scenarios, providing strong support for further research in the field of food recognition.

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios

The Food Recognition Benchmark: Using Deep Learning to Recognize Food in Images

The Food Recognition Benchmark: Using DeepLearning to Recognize Food on Images

Cross-domain Cross-modal Food Transfer.

Large Scale Visual Food Recognition

Few-shot Food Recognition with Pre-trained Model.

A Study of Multi-Task and Region-Wise Deep Learning for Food Ingredient Recognition.

DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment

Long-Tailed Continual Learning For Visual Food Recognition

ChinFood1000: A Large Benchmark Dataset for Chinese Food Recognition

Recognize after early fusion: the Chinese food recognition based on the alignment of image and ingredients

A Large-Scale Benchmark for Food Image Segmentation

Multi-Task Image-Based Dietary Assessment for Food Recognition and Portion Size Estimation

Superpixel-Based Image Recognition For Food Images

Automatic Chinese Food recognition based on a stacking fusion model

An End-to-End Food Image Analysis System

Recognition of food images based on transfer learning and ensemble learning

NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches

Mixed Dish Recognition with Contextual Relation and Domain Alignment

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models