Abstract:Background: Food categorization and nutrient profiling are labor intensive, time consuming, and costly tasks, given the number of products and labels in large food composition databases and the dynamic food supply. Objectives: This study used a pretrained language model and supervised machine learning to automate food category classification and nutrition quality score prediction based on manually coded and validated data, and compared prediction results with models using bag-of-words and structured nutrition facts as inputs for predictions. Methods: Food product information from University of Toronto Food Label Information and Price Database 2017 (n = 17,448) and University of Toronto Food Label Information and Price Database 2020 (n = 74,445) databases were used. Health Canada's Table of Reference Amounts (TRA) (24 categories and 172 subcategories) was used for food categorization and the Food Standards of Australia and New Zealand (FSANZ) nutrient profiling system was used for nutrition quality score evaluation. TRA categories and FSANZ scores were manually coded and validated by trained nutrition researchers. A modified pretrained sentence-Bidirectional Encoder Representations from Transformers model was used to encode unstructured text from food labels into lower-dimensional vector representations, followed by supervised machine learning algorithms (i.e., elastic net, k-Nearest Neighbors, and XGBoost) for multiclass classification and regression tasks. Results: Pretrained language model representations utilized by the XGBoost multiclass classification algorithm reached overall accuracy scores of 0.98 and 0.96 in predicting food TRA major and subcategories, outperforming bag-of-words methods. For FSANZ score prediction, our proposed method reached a similar prediction accuracy (R2: 0.87 and MSE: 14.4) compared with bag-of-words methods (R2: 0.72-0.84; MSE: 30.3-17.6), whereas structured nutrition facts machine learning model performed the best (R2: 0.98; MSE: 2.5). The pretrained language model had a higher generalizable ability on the external test datasets than bag-of-words methods. Conclusions: Our automation achieved high accuracy in classifying food categories and predicting nutrition quality scores using text information found on food labels. This approach is effective and generalizable in a dynamic food environment, where large amounts of food label data can be obtained from websites.

Integrating Vision-Language Models for Accelerated High-Throughput Nutrition Screening

UMDFood: Vision-language models boost food composition compilation

Nutritional composition analysis in food images: an innovative Swin Transformer approach

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge

Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound

Enhancing Community Vision Screening -- AI Driven Retinal Photography for Early Disease Detection and Patient Trust

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

Multi-sensor integration approach based on hyperspectral imaging and electronic nose for quantitation of fat and peroxide value of pork meat

An Intelligent Vision-Based Nutritional Assessment Method for Handheld Food Items

A Novel Strategy for Rapidly and Accurately Screening Biomarkers Based on Ultraperformance Liquid Chromatography-Mass Spectrometry Metabolomics Data.

Beyond the Hype: A dispassionate look at vision-language models in medical scenario

Using VIS-NIR hyperspectral imaging and deep learning for non-destructive high-throughput quantification and visualization of nutrients in wheat grains

Monocular Visual Pig Weight Estimation Method Based on the EfficientVit-C Model

A Survey of Medical Vision-and-Language Applications and Their Techniques

Translational Algorithms for Technological Dietary Quality Assessment Integrating Nutrimetabolic Data with Machine Learning Methods

Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods

Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker