Abstract:Food is foundational to human life, serving not only as a source of nourishment but also as a cornerstone of cultural identity and social interaction. As the complexity of global dietary needs and preferences grows, food intelligence is needed to enable food perception and reasoning for various tasks, ranging from recipe generation and dietary recommendation to diet-disease correlation discovery and understanding. Towards this goal, for powerful capabilities across various domains and tasks in Large Language Models (LLMs), we introduce Food-oriented LLM FoodSky to comprehend food data through perception and reasoning. Considering the complexity and typicality of Chinese cuisine, we first construct one comprehensive Chinese food corpus FoodEarth from various authoritative sources, which can be leveraged by FoodSky to achieve deep understanding of food-related data. We then propose Topic-based Selective State Space Model (TS3M) and the Hierarchical Topic Retrieval Augmented Generation (HTRAG) mechanism to enhance FoodSky in capturing fine-grained food semantics and generating context-aware food-relevant text, respectively. Our extensive evaluations demonstrate that FoodSky significantly outperforms general-purpose LLMs in both chef and dietetic examinations, with an accuracy of 67.2% and 66.4% on the Chinese National Chef Exam and the National Dietetic Exam, respectively. FoodSky not only promises to enhance culinary creativity and promote healthier eating patterns, but also sets a new standard for domain-specific LLMs that address complex real-world issues in the food domain. An online demonstration of FoodSky is available at <a class="link-external link-http" href="http://222.92.101.211:8200" rel="external noopener nofollow">this http URL</a>.

What problem does this paper attempt to address?

The paper attempts to address the following issues: 1. **Limitations of existing large language models (LLMs) in the food domain**: - **Insufficient understanding**: Existing food domain LLMs are primarily pre-trained on general language models, which cannot fully understand and process the fine-grained features of food information, leading to inaccurate recognition and analysis results. - **Lack of cultural diversity**: Current LLMs exhibit biases when handling food queries from different cultural backgrounds, especially with a bias towards Western food knowledge, which may result in incorrect or insensitive responses to food queries from other cultural backgrounds. - **Limited knowledge coverage**: Existing food LLMs fail to comprehensively cover various dietary habits and culinary traditions, particularly performing poorly in applications across different regions. 2. **Building a high-quality food corpus**: - **Data scarcity and dispersion**: Compared to fields like news and media, food data is relatively scarce and dispersed across different sources such as cooking websites, recipe databases, food blogs, etc. The data quality is uneven, including spelling and grammatical errors, duplicate invalid data, and irrelevant information, making the data cleaning process complex. - **Diverse food topics**: The food domain encompasses a wide range of topics, including ingredients, cuisines, dietary habits, and nutritional information, posing challenges for the model to understand and process these diverse topics. - **Cross-cultural food knowledge processing**: Different regions and cultures have different dietary habits, taste preferences, and culinary traditions, increasing the complexity for LLMs to handle food queries from different backgrounds. To address the above issues, the paper introduces FoodSky, the first large Chinese language model specifically for the food domain. FoodSky overcomes these challenges through the following methods: 1. **Building a large-scale food corpus FoodEarth**: - Data is collected and processed from various authoritative sources, including e-books, academic journals, and expert-recognized websites. Through multi-stage data filtering and annotation methods, a high-quality dataset containing 811,491 question-answer pairs was constructed. 2. **Developing the Topic-based Selective State Space Model (TS3M)**: - Used to capture fine-grained food semantics, enabling the model to better understand and handle different thematic tasks. 3. **Proposing the Hierarchical Topic Retrieval Augmented Generation (HTRAG) mechanism**: - Ensures the model has better generalization capabilities, allowing it to handle food-related information from different cultural backgrounds based on knowledge enhancement. Through these methods, FoodSky performed excellently in chef and nutritionist exams, achieving 67.2% and 66.4% accuracy respectively under zero-shot conditions, significantly outperforming existing general LLMs. FoodSky not only enhances culinary creativity and promotes healthy eating but also sets a new standard for LLMs in the food domain.

FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination

FoodLMM: A Versatile Food Assistant using Large Multi-modal Model

TCMChat: A Generative Large Language Model for Traditional Chinese Medicine

FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models

Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge

UMDFood: Vision-language models boost food composition compilation

LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

Large Scale Visual Food Recognition

Skywork: A More Open Bilingual Foundation Model

You Are What You Eat: Exploring Rich Recipe Information for Cross-Region Food Analysis

SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery

Learning Structural Representations for Recipe Generation and Food Retrieval

RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models

Large Language Models for Ingredient Substitution in Food Recipes using Supervised Fine-tuning and Direct Preference Optimization

FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt

The Multi-Learning for Food Analyses in Computer Vision: a Survey.

FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation

Automatic Chinese Food recognition based on a stacking fusion model

Application of Deep Learning in Food: A Review