Infinite Physical Monkey: Do Deep Learning Methods Really Perform Better in Conformation Generation?

Haotian Zhang,Jintu Zhang,Huifeng Zhao,Dejun Jiang,Yafeng Deng
2023-03-08
Abstract:Conformation Generation is a fundamental problem in drug discovery and cheminformatics. And organic molecule conformation generation, particularly in vacuum and protein pocket environments, is most relevant to drug design. Recently, with the development of geometric neural networks, the data-driven schemes have been successfully applied in this field, both for molecular conformation generation (in vacuum) and binding pose generation (in protein pocket). The former beats the traditional ETKDG method, while the latter achieves similar accuracy compared with the widely used molecular docking software. Although these methods have shown promising results, some researchers have recently questioned whether deep learning (DL) methods perform better in molecular conformation generation via a parameter-free method. To our surprise, what they have designed is some kind analogous to the famous infinite monkey theorem, the monkeys that are even equipped with physics education. To discuss the feasibility of their proving, we constructed a real infinite stochastic monkey for molecular conformation generation, showing that even with a more stochastic sampler for geometry generation, the coverage of the benchmark QM-computed conformations are higher than those of most DL-based methods. By extending their physical monkey algorithm for binding pose prediction, we also discover that the successful docking rate also achieves near-best performance among existing DL-based docking models. Thus, though their conclusions are right, their proof process needs more concern.
Biomolecules,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper primarily explores whether deep learning methods are truly superior to traditional methods in molecular conformation generation and binding pose prediction. Specifically: 1. **Molecular Conformation Generation**: - The paper discusses a recent study that shows a carefully designed non-parametric method (e.g., through dihedral angle sampling) outperforms existing deep learning methods in the task of molecular conformation generation. - The authors constructed an "infinite random monkey" algorithm to demonstrate that even more random methods can achieve similar performance under large-scale sampling. - They found that under large-scale sampling conditions, random algorithms can achieve near-optimal results in terms of coverage (COV) and mean absolute deviation (MAT) metrics. 2. **Binding Pose Prediction**: - In the task of binding pose prediction, the authors extended the "infinite physical monkey" algorithm and proposed baseline models based on scoring functions and force field optimization. - Experimental results show that large-scale sampling can significantly improve the success rate of molecular docking, thereby challenging the practice of using RDKit+clustering as a fair baseline. - The authors also pointed out that deep learning models have the potential to surpass traditional methods in the task of binding pose prediction given pocket conditions. In summary, the paper aims to systematically analyze the effectiveness of the "infinite physical monkey" algorithm, reveal the actual performance of deep learning methods in molecular conformation generation and binding pose prediction, and propose some improvement suggestions.