Abstract:Food touches our lives through various endeavors, including flavor, nourishment, health, and sustainability. Recipes are cultural capsules transmitted across generations via unstructured text. Automated protocols for recognizing named entities, the building blocks of recipe text, are of immense value for various applications ranging from information extraction to novel recipe generation. Named entity recognition is a technique for extracting information from unstructured or semi-structured data with known labels. Starting with manually-annotated data of 6,611 ingredient phrases, we created an augmented dataset of 26,445 phrases cumulatively. Simultaneously, we systematically cleaned and analyzed ingredient phrases from RecipeDB, the gold-standard recipe data repository, and annotated them using the Stanford NER. Based on the analysis, we sampled a subset of 88,526 phrases using a clustering-based approach while preserving the diversity to create the machine-annotated dataset. A thorough investigation of NER approaches on these three datasets involving statistical, fine-tuning of deep learning-based language models and few-shot prompting on large language models (LLMs) provides deep insights. We conclude that few-shot prompting on LLMs has abysmal performance, whereas the fine-tuned spaCy-transformer emerges as the best model with macro-F1 scores of 95.9%, 96.04%, and 95.71% for the manually-annotated, augmented, and machine-annotated datasets, respectively.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the problem of Named Entity Recognition (NER) in recipe texts. Specifically, it focuses on extracting named entities from unstructured or semi-structured recipe texts, including information such as ingredient names, quantities, units, states (e.g., fresh/dry), sizes, and temperatures. ### Background and Motivation 1. **Importance and Diversity of Food**: Food plays a crucial role in our lives, providing not only nutrition and taste enjoyment but also involving aspects of health and sustainability. 2. **Unstructured Nature of Recipe Texts**: Recipes are usually presented in unstructured text form, containing a large number of named entities. Automated NER technology is essential for extracting valuable information from these texts. 3. **Wide Range of Applications**: NER technology has broad application value in information extraction, new recipe generation, dietary safety detection, restaurant operation optimization, food safety tracking, cost, and sustainability analysis. ### Research Objectives 1. **Creating Annotated Datasets**: The paper creates multiple datasets through manual annotation and data augmentation techniques, including manually annotated datasets, extended datasets, and machine-annotated datasets. 2. **Evaluating the Performance of Different Models**: The study investigates the performance of statistical methods, fine-tuning of deep learning-based language models, and few-shot prompting on large-scale language models. 3. **Proposing the Best Model**: Through experimental validation, the best-performing model for the task of NER in recipe texts is determined. ### Main Contributions 1. **Creation of Datasets**: - Manually Annotated Dataset: Contains 6,611 ingredient phrases. - Extended Dataset: Expanded to 26,445 ingredient phrases through techniques such as label replacement, synonym replacement, and intra-segment shuffling. - Machine-Annotated Dataset: Extracted 349,762 unique ingredient phrases from the RecipeDB dataset and selected 88,526 phrases for annotation using the Stratified Entity Frequency Sampling (SEFS) method. 2. **Model Evaluation**: - Evaluated different models using macro F1 score, precision, and recall. - Experimental results show that the fine-tuned spaCy-transformer model performs excellently on all three datasets, achieving macro F1 scores of 95.9%, 96.04%, and 95.71%, respectively. 3. **Evaluation of Few-Shot Prompting**: - Few-shot prompting on large-scale language models performed poorly, indicating a lack of domain-specific knowledge in these models, necessitating further domain-specific data fine-tuning. ### Conclusion By creating high-quality datasets and evaluating various models, the paper successfully addresses the problem of NER in recipe texts. The research results indicate that deep learning-based models, particularly the fine-tuned spaCy-transformer model, perform excellently in this task. Additionally, the study highlights the limitations of using few-shot prompting in specific domains, pointing out directions for future research.

Deep Learning Based Named Entity Recognition Models for Recipes

A Named Entity Based Approach to Model Recipes

Caries Incidence in Intact Rats Infected with Streptococcus sobrinus via Transmission from Desalivated Cagemates

TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark

Fine-grained food image classification and recipe extraction using a customized deep neural network and NLP

Deep Image-to-Recipe Translation

Food Recipe Recommendation Based on Ingredients Detection Using Deep Learning

Deep-based Ingredient Recognition for Cooking Recipe Retrieval

A Survey on Deep Learning for Named Entity Recognition

Towards Automated Recipe Genre Classification using Semi-Supervised Learning

Named Entity Recognition for English Language Using Deep Learning Based Bi Directional LSTM-RNN

Advancements in Named Entity Recognition using Deep Learning Techniques: A Comprehensive Study on Emerging Trends

Predefined domain specific embeddings of food concepts and recipes: A case study on heterogeneous recipe datasets

Recipe Recommendation by Ingredients Detection

“FabNER”: information extraction from manufacturing process science domain literature using named entity recognition

Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning

A Rich Recipe Representation as Plan to Support Expressive Multi Modal Queries on Recipe Content and Preparation Process

LLaVA-Chef: A Multi-modal Generative Model for Food Recipes

A deep neural network-based model for named entity recognition for Hindi language

Large Language Models for Ingredient Substitution in Food Recipes using Supervised Fine-tuning and Direct Preference Optimization