FlavorMiner: A Machine Learning Platform for Extracting Molecular Flavor Profiles from Structural Data

Mehdi D. Davari,Fabio Herrera-Rocha,Miguel Fernández-Niño,Jorge Duitama,Mónica P. Cala,María José Chica,Ludger A. Wessjohann,Andrés Fernando González Barrios
DOI: https://doi.org/10.26434/chemrxiv-2024-821xm
2024-06-18
Abstract:Flavor is the main factor driving consumers acceptance of food products. However, tracking the biochemistry of flavor is a formidable challenge due to the complexity of food composition. Current methodologies for linking individual molecules to flavor in foods and beverages are expensive and time-consuming. Predictive models based on machine learning (ML) are emerging as an alternative to speed up this process. Nonetheless, the optimal approach to predict flavor features of molecules remains elusive. In this work we present FlavorMiner, an ML-based multilabel flavor predictor. FlavorMiner seamlessly integrates different combinations of algorithms and mathematical representations, augmented with class balance strategies to address the inherent class of the input dataset. Notably, Random Forest and K-Nearest Neighbors combined with Extended Connectivity Fingerprint and RDKit molecular descriptors consistently outperform other combinations in most cases. Resampling strategies surpass weight balance methods in mitigating bias associated with class imbalance. FlavorMiner exhibits remarkable accuracy, with an average ROC AUC score of 0.88. This algorithm was used to analyze cocoa metabolomics data, unveiling its profound potential to help extract valuable insights from intricate food metabolomics data. FlavorMiner can be used for flavor mining in any food product, drawing from a diverse training dataset that spans over 934 distinct food products.
Chemistry
What problem does this paper attempt to address?
This paper introduces FlavorMiner, a machine learning-based multi-label flavor prediction platform for extracting molecular flavor features from structural data. The study points out that food flavor is crucial for consumer acceptance, but its biochemical tracing is challenging due to the complexity of food composition. Current methods are both expensive and time-consuming. Machine learning (ML) prediction models are emerging as alternative solutions, but how to optimize the methods for predicting flavor features is still unclear. FlavorMiner combines different algorithms and mathematical representations to address the inherent class imbalance issue in the input dataset. The study found that a combination of random forest and K-nearest neighbor algorithms with extended connectivity fingerprints and RDKit molecular descriptors performs best in most cases. Resampling strategies are more effective in mitigating bias related to class imbalance than weight-balancing methods. FlavorMiner achieves high accuracy, with an average ROC AUC score of 0.88, and demonstrates potential in cocoa metabolomics data analysis, aiding in valuable information extraction from complex food metabolomics data. The paper also discusses existing flavor prediction methods, such as experimental methods requiring compound isolation or synthesis, and alternative methods inferring from sensory results' correlations. Despite existing binary classifiers for sweetness and bitterness, there is still limited availability of multi-label prediction tools for other flavors such as floral, fruity, and sour tastes. FlavorMiner aims to address this issue by predicting seven key flavor categories and is applicable to various food products, with training data covering over 934 different food products. In conclusion, this paper addresses the challenge of efficiently and accurately predicting food flavor features from molecular structural data. It proposes a machine learning platform called FlavorMiner, which offers a faster flavor analysis method for the food industry.