CIMG-BERT: Pretraining Bidirectional Transformers with Chemistry Knowledge for Molecular Property Prediction

Xin Meng,Binglan Wu,Yilei Liang,Chenkai Gu,Wenjie Du,Xin Chen
DOI: https://doi.org/10.1109/iceitsa57468.2022.00043
2022-01-01
Abstract:Machine learning methods, such as graph neural networks (GNN), molecular graph BERT (MG-BERT) have shown great potential for molecular property prediction. However, most existing methods overlook chemical intuition or knowledge. Ignore such valuable expert knowledge would limit the interpretability of the models and impede further improvement of their performance. In this paper, we propose the Chemistry-Informed Molecular Graph BERT (CIMG-BERT) by fusing the relevant chemical characteristics in molecular graph with self-attention-based BERT model. In addition, we redesigned an effective self-supervised learning strategy where masked bond level (i.e., hybrid types of electron orbitals) prediction is treated as a pretraining task to mine context information in molecules. The experimental results on six benchmark molecular datasets show that CIMG-BERT consistently outperforms existing methods. Furthermore, we show that CIMG-BERT allows the atomic representation with key chemical information and is generalizable and transferable for molecular property prediction.
What problem does this paper attempt to address?