Geometry-based BERT: an experimentally validated deep learning model for molecular property prediction in drug discovery

Xiang Zhang,Chenliang Qian,Bochao Yang,Hongwei Jin,Song Wu,Jie Xia,Fan Yang,Liangren Zhang
DOI: https://doi.org/10.1101/2024.12.24.630211
2024-12-25
Abstract:Various deep learning based methods have significantly impacted the realm of drug discovery. The development of deep learning methods for identifying novel structural types of active compounds has become an urgent challenge. In this paper, we introduce a self-supervised representation learning framework, i.e., GEO-BERT. GEO-BERT considers the information of atoms and chemical bonds in chemical structures as the input, and integrates the positional information of the three-dimensional conformation of the molecule for training. Specifically, GEO-BERT enhances its ability to characterize molecular structures by introducing three different positional relationships: atom-atom, bond-bond, and atom-bond. By benchmarking study, GEO-BERT has demonstrated optimal performance on multiple benchmarks. We also performed prospective study to validate the GEO-BERT model, with screening for DYRK1A inhibitors as a case. Two potent and novel DYRK1A inhibitors (IC50: <1 μM) were ultimately discovered at a hit rate of 10%. Taken together, we have developed the Geometry-based BERT model for molecular property prediction and proved its practical utility in early-stage drug discovery.
Biology
What problem does this paper attempt to address?