Insights into Deep Learning Framework for Molecular Property Prediction Based on Different Tokenization Algorithms

Jianlin Yan,Zhenyu Zhang,Miaomiao Meng,Jun Li,Lanyi Sun
DOI: https://doi.org/10.1016/j.ces.2023.119471
IF: 4.7
2024-01-01
Chemical Engineering Science
Abstract:With the rapid development of deep learning, research on quantitative structure-property relationships based on deep learning has received widespread attention. The deep learning architecture combining Bidirectional Encoder Representation from Transformers (BERT) and Feedforward Neural Networks (FNN) is proposed to compare the performance of different tokenization algorithms. And t-distributed stochastic neighbor embedding reveals valuable information about the mechanism of structure-property relationships. Additionally, a deep learning framework, BERT-Convolutional Neural Network (CNN)-FNN, is developed based on the optimal tokenization algorithm to accurately predict the sigma-profile and VCOSMO. The molecular structures are vectorized with the BERT model capturing local and global features of the entire molecule. And the CNN model enhances the latent representation associated with molecular properties, while the FNN model establishes the correlation. The deep learning frameworks predict sigma-profile and VCOSMO properties with R2 greater than 0.9703, making it a promising intelligent tool for guiding solvent design and screening.
What problem does this paper attempt to address?