Deep-Learning Based Docking Methods: Fair Comparisons to Conventional Docking Workflows

Ajay N. Jain,Ann E. Cleves,W. Patrick Walters
2024-12-04
Abstract:The diffusion learning method, DiffDock, for docking small-molecule ligands into protein binding sites was recently introduced. Results included comparisons to more conventional docking approaches, with DiffDock showing superior performance. Here, we employ a fully automatic workflow using the Surflex-Dock methods to generate a fair baseline for conventional docking approaches. Results were generated for the common and expected situation where a binding site location is known and also for the condition of an unknown binding site. For the known binding site condition, Surflex-Dock success rates at 2.0 Angstroms RMSD far exceeded those for DiffDock (Top-1/Top-5 success rates, respectively, were 68/81% compared with 45/51%). Glide performed with similar success rates (67/73%) to Surflex-Dock for the known binding site condition, and results for AutoDock Vina and Gnina followed this pattern. For the unknown binding site condition, using an automated method to identify multiple binding pockets, Surflex-Dock success rates again exceeded those of DiffDock, but by a somewhat lesser margin. DiffDock made use of roughly 17,000 co-crystal structures for learning (98% of PDBBind version 2020, pre-2019 structures) for a training set in order to predict on 363 test cases (2% of PDBBind 2020) from 2019 forward. DiffDock's performance was inextricably linked with the presence of near-neighbor cases of close to identical protein-ligand complexes in the training set for over half of the test set cases. DiffDock exhibited a 40 percentage point difference on near-neighbor cases (two-thirds of all test cases) compared with cases with no near-neighbor training case. DiffDock has apparently encoded a type of table-lookup during its learning process, rendering meaningful applications beyond its reach. Further, it does not perform even close to competitively with a competently run modern docking workflow.
Artificial Intelligence,Biomolecules
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to evaluate the real - performance of deep - learning - based molecular docking methods (especially DiffDock) compared with traditional docking methods. Specifically, the authors focus on the following aspects: 1. **Providing a fair baseline comparison**: The paper aims to establish a reasonable baseline for the performance of traditional docking methods (such as Surflex - Dock, Glide, AutoDock Vina and Gnina) on the DiffDock test set. The authors use a fully automated process to process PDB structures and ensure the quality of these structures to avoid bias caused by data quality problems. 2. **Understanding the performance - driving factors of DiffDock**: The authors attempt to understand the reasons why DiffDock performs well in some cases. By analyzing the relationship between the training set and the test set, especially the influence of neighboring training cases, the authors find that the performance of DiffDock largely depends on the existence of cases in the test set that are very similar to the protein - ligand complexes in its training set. 3. **Revealing potential problems**: The paper points out that the success rate of DiffDock is largely influenced by neighboring training cases, which makes its performance seem better than it actually is. In addition, the authors also point out the unfair comparison of traditional docking methods in the original DiffDock report, for example, docking without defining the binding site, which does not conform to the actual application scenarios of these methods. 4. **Emphasizing the importance of practical applications**: The authors emphasize that in the field of computer - aided drug design (CADD), predicting the binding pose of new compounds at unknown binding sites is a more challenging and practical problem, rather than simply redocking known ligands. ### Main conclusions - **DiffDock's performance depends on neighboring training cases**: For about two - thirds of the test cases, DiffDock performs well because these cases have very similar neighbors in its training set. For cases without neighbors, DiffDock performs significantly worse. - **Traditional docking methods perform better**: In the case of known binding sites, traditional docking methods such as Surflex - Dock and Glide perform significantly better than DiffDock. Even in the case of unknown binding sites, Surflex - Dock also performs better. - **Misleading benchmark tests**: The benchmark tests in the original DiffDock report are misleading in the evaluation of traditional docking methods because they are not tested according to the best practices of these methods. In conclusion, this paper reveals the limitations of DiffDock's performance through detailed analysis and emphasizes the importance of correctly evaluating new methods in the CADD field.