Machine-Learning-Based Olfactometry: Odor Descriptor Clustering Analysis for Olfactory Perception Prediction of Odorant Molecules

L. Shang,Chuanjun Liu,Fengzhen Tang,Bin Chen,Lianqing Liu,Kenshi Hayashi
2022-01-01
Abstract:: Although gas chromatography/olfactometry (GC/O) has been employed as a powerful analytical tool in odor measurement, its application is limited by the variability, subjectivity, and high cost of the trained panelists who are used as detectors in the system. The advancements in data-driven science have made it possible to predict structure-odor-relationship (SOR) and thus to develop machine-learning-based olfactometry (ML-GCO) in which the human panelists may be replaced by machine learning models to obtain the sensory information of GC-separated chemical compounds. However, one challenge remained in ML-GCO is that there are too many odor descriptors (ODs) being used to describe the sensory characteristics of odorants. It is impractical to build a corresponding model for each OD. To solve this issue, we propose a SOR prediction approach based on odor descriptor clustering. 256 representative ODs are firstly classified into 20 categories using a co-occurrence Bayesian embedding model. The categorization effect is explained according to the semantic relationships using a pre-trained Word2Vec model. Various molecular structure features including molecularly parameters, molecular fingerprints, and molecular 2D graphic features extracted by convolutional neural networks, are employed to predict the aforementioned odor categories. High prediction accuracies (Area under ROC curve was 0.800±0.004) demonstrate the rationality of the proposed clustering scenario and molecular feature extraction. This study makes the ML-GCO models much closer to the practical application since they can be expected as either an auxiliary system or complete replacement of human panelists to perform the olfactory evaluation.
What problem does this paper attempt to address?