Integrating Vision-Language Models for Accelerated High-Throughput Nutrition Screening

Peihua Ma,Yixin Wu,Ning Yu,Xiaoxue Jia,Yiyang He,Yang Zhang,Michael Backes,Qin Wang,Cheng-I Wei
DOI: https://doi.org/10.1002/advs.202403578
Abstract:Addressing the critical need for swift and precise nutritional profiling in healthcare and in food industry, this study pioneers the integration of vision-language models (VLMs) with chemical analysis techniques. A cutting-edge VLM is unveiled, utilizing the expansive UMDFood-90k database, to significantly improve the speed and accuracy of nutrient estimation processes. Demonstrating a macro-AUCROC of 0.921 for lipid quantification, the model exhibits less than 10% variance compared to traditional chemical analyses for over 82% of the analyzed food items. This innovative approach not only accelerates nutritional screening by 36.9% when tested amongst students but also sets a new benchmark in the precision of nutritional data compilation. This research marks a substantial leap forward in food science, employing a blend of advanced computational models and chemical validation to offer a rapid, high-throughput solution for nutritional analysis.
What problem does this paper attempt to address?