Towards Automated Recipe Genre Classification using Semi-Supervised Learning

Nazmus Sakib,G. M. Shahariar,Md. Mohsinul Kabir,Md. Kamrul Hasan,Hasan Mahmud

DOI: https://doi.org/10.48550/arXiv.2310.15693

2023-10-24

Computation and Language

Abstract:Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the ``Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools. 3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation. Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6\%. Our investigation indicates that the title feature played a more significant role in classifying the genre.

What problem does this paper attempt to address?

The paper attempts to address the problem of automatic classification of online cooking recipes. Specifically, the authors focus on how to categorize a large number of raw recipes found on the internet into appropriate food types. This task is challenging due to the lack of sufficient labeled data. To tackle this issue, the authors created a dataset named "Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset," which contains 2 million cooking recipes with extended named entities, and these recipes are annotated with corresponding category labels. Additionally, the authors proposed a semi-supervised learning-based method to improve the accuracy of named entity recognition (NER), particularly for missing named entities related to temperature, time, and methods in the cooking process. Through this approach, the authors not only enhanced the accuracy of recipe classification but also provided valuable resources for other recipe-related tasks, such as recommendation systems, ingredient substitution, dietary analysis, and recipe summarization. Ultimately, the study achieved an overall accuracy of 98.6% in the recipe classification task, demonstrating the effectiveness of pre-trained language models (such as DistilBERT and RoBERTa) in recipe classification.

Towards Automated Recipe Genre Classification using Semi-Supervised Learning

Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning

A Named Entity Based Approach to Model Recipes

Deep Learning Based Named Entity Recognition Models for Recipes

MS-GDA: Improving Heterogeneous Recipe Representation via Multinomial Sampling Graph Data Augmentation

Food Recipe Recommendation Based on Ingredients Detection Using Deep Learning

Classification of Cuisines from Sequentially Structured Recipes

MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model.

Predefined domain specific embeddings of food concepts and recipes: A case study on heterogeneous recipe datasets

MCEN: Bridging Cross-Modal Gap Between Cooking Recipes and Dish Images with Latent Variable Model

Multimodal Recipe Recommendation System Using Deep Learning and Rule-Based Approach

Cook-Gen: Robust Generative Modeling of Cooking Actions from Recipes

Fine-grained food image classification and recipe extraction using a customized deep neural network and NLP

Learning Structural Representations for Recipe Generation and Food Retrieval

Automated Recipe Generation using Ingredient Classification based on an Image from a Real-Time Photo Station

Multi-modal Cooking Workflow Construction for Food Recipes

Question Classification and Answer Extraction for Developing a Cooking QA System

You Are What You Eat: Exploring Rich Recipe Information for Cross-Region Food Analysis

CuisineNet: Food Attributes Classification using Multi-scale Convolution Network

DiNeR: a Large Realistic Dataset for Evaluating Compositional Generalization

Learning Program Representations for Food Images and Cooking Recipes