Abstract:To incorporate the cooking logic into ingredient recognition from food images is beneficial for food cognition. Compared with food categorization, ingredient recognition gives a better understanding on food cognition, by providing crucial information on food compositions. However, there exist situations in which different food are made of different ingredients, thus it is necessary to incorporate cooking logic into ingredient recognition to achieve a better food cognition. Based on this point, our paper proposes a sequential learning method to guide a neural network based (NN-based) model on producing ingredients following the corresponding cooking logic in recipes. Firstly, in order to make a maximum utilization of visual features from images, a double-flow feature fusion module (DFFF) is proposed to obtain features from two image-based, visual tasks (food name proposal and multi-label ingredient proposal). After that, fused features from DFFF, together with original image features, are feed into a bidirectional long short time memory (Bi-LSTM) based ingredient generator to produce sequential ingredients. To guide the sequential ingredient generation process, reinforcement learning is employed by designing a hybrid loss related to both the common and personality traits in ingredients for optimizing the model ability of associating images and sequential ingredients. In addition, sequential ingredients are utilized in a backward flow by reconstructing food images, so that sequential ingredient generation can be further optimized in a complementary manner. In experiments, the results demonstrate the superiority of our method on driving the model to allocate more attention to the correlation between images and sequential ingredients, and produced ingredients are comprehensive and logical.

Ingredient-enriched Recipe Generation from Cooking Videos

Recipe Generation from Unsegmented Cooking Videos

Video-based Recipe Retrieval

What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

Sequential learning for ingredient recognition from images

A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos

MCEN: Bridging Cross-Modal Gap Between Cooking Recipes and Dish Images with Latent Variable Model

MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model.

Deep Understanding Of Cooking Procedure For Cross-Modal Recipe Retrieval

Retrieval Augmented Recipe Generation

Cook-Gen: Robust Generative Modeling of Cooking Actions from Recipes

Deep-based Ingredient Recognition for Cooking Recipe Retrieval

Cross-modal recipe retrieval based on unified text encoder with fine-grained contrastive learning

Deep Image-to-Recipe Translation

CookGAN: Meal Image Synthesis from Ingredients

Cross-Modal Recipe Retrieval: How to Cook This Dish?

Learning Program Representations for Food Images and Cooking Recipes

Efficient Pre-training for Localized Instruction Generation of Videos

Cross-modal Recipe Retrieval with Rich Food Attributes

Inverse Cooking: Recipe Generation from Food Images

The Art of Food: Meal Image Synthesis from Ingredients