Mol-PECO: a deep learning model to predict human olfactory perception from molecular structures

Mengji Zhang,Yusuke Hiki,Akira Funahashi,Tetsuya J. Kobayashi
2023-05-21
Abstract:While visual and auditory information conveyed by wavelength of light and frequency of sound have been decoded, predicting olfactory information encoded by the combination of odorants remains challenging due to the unknown and potentially discontinuous perceptual space of smells and odorants. Herein, we develop a deep learning model called Mol-PECO (Molecular Representation by Positional Encoding of Coulomb Matrix) to predict olfactory perception from molecular structures. Mol-PECO updates the learned atom embedding by directional graph convolutional networks (GCN), which model the Laplacian eigenfunctions as positional encoding, and Coulomb matrix, which encodes atomic coordinates and charges. With a comprehensive dataset of 8,503 molecules, Mol-PECO directly achieves an area-under-the-receiver-operating-characteristic (AUROC) of 0.813 in 118 odor descriptors, superior to the machine learning of molecular fingerprints (AUROC of 0.761) and GCN of adjacency matrix (AUROC of 0.678). The learned embeddings by Mol-PECO also capture a meaningful odor space with global clustering of descriptors and local retrieval of similar odorants. Our work may promote the understanding and decoding of the olfactory sense and mechanisms.
Machine Learning,Artificial Intelligence,Biomolecules,Neurons and Cognition
What problem does this paper attempt to address?
The paper aims to address the challenge of predicting human olfactory perception through molecular structure, namely the Quantitative Structure-Odor Relationship (QSOR) problem. The authors developed a deep learning model called Mol-PECO, which combines the Coulomb Matrix (CM) and the Spectral Attention Network (SAN) to predict olfactory perception directly from molecular structures. Specifically, Mol-PECO improves upon traditional Graph Convolutional Network (GCN) methods in the following ways: 1. **Molecular Representation**: Mol-PECO adopts the Coulomb Matrix as a global representation of molecules. Compared to the adjacency matrix that only encodes chemical bonds, the Coulomb Matrix can encode more structural information, including the three-dimensional coordinates and charges of atoms, which helps to capture atomic and 3D information related to the binding affinity with olfactory receptors. 2. **Graph Modeling Approach**: Mol-PECO utilizes the eigenfunctions of the graph Laplacian operator for positional encoding to compensate for the shortcomings of GCNs in directional modeling. This positional encoding strategy can differentiate molecules with different structures, enhancing the model's expressive power. With these improvements, Mol-PECO achieved significant performance gains on a dataset containing 8,503 molecules and 118 olfactory descriptors, reaching an area under the receiver operating characteristic curve (AUROC) of 0.813 and an area under the precision-recall curve (AUPRC) of 0.181, outperforming traditional machine learning methods based on molecular fingerprints and GCN methods that only use the adjacency matrix. Furthermore, the embeddings learned by Mol-PECO are capable of capturing meaningful olfactory spaces, demonstrating the ability to retrieve globally clustered and locally similar olfactory molecules, providing new perspectives for understanding the mechanisms of olfactory perception and the design of odors.