Abstract:OBJECTIVE: The objective of this paper is to highlight the state-of-the-art machine learning (ML) techniques in computational docking. The use of smart computational methods in the life cycle of drug design is relatively a recent development that has gained much popularity and interest over the last few years. Central to this methodology is the notion of computational docking which is the process of predicting the best pose (orientation + conformation) of a small molecule (drug candidate) when bound to a target larger receptor molecule (protein) in order to form a stable complex molecule. In computational docking, a large number of binding poses are evaluated and ranked using a scoring function. The scoring function is a mathematical predictive model that produces a score that represents the binding free energy, and hence the stability, of the resulting complex molecule. Generally, such a function should produce a set of plausible ligands ranked according to their binding stability along with their binding poses. In more practical terms, an effective scoring function should produce promising drug candidates which can then be synthesized and physically screened using high throughput screening process. Therefore, the key to computer-aided drug design is the design of an efficient highly accurate scoring function (using ML techniques).METHODS: The methods presented in this paper are specifically based on ML techniques. Despite many traditional techniques have been proposed, the performance was generally poor. Only in the last few years started the application of the ML technology in the design of scoring functions; and the results have been very promising.MATERIAL: The ML-based techniques are based on various molecular features extracted from the abundance of protein-ligand information in the public molecular databases, e.g., protein data bank bind (PDBbind).RESULTS: In this paper, we present this paradigm shift elaborating on the main constituent elements of the ML approach to molecular docking along with the state-of-the-art research in this area. For instance, the best random forest (RF)-based scoring function on PDBbind v2007 achieves a Pearson correlation coefficient between the predicted and experimentally determined binding affinities of 0.803 while the best conventional scoring function achieves 0.644. The best RF-based ranking power ranks the ligands correctly based on their experimentally determined binding affinities with accuracy 62.5% and identifies the top binding ligand with accuracy 78.1%.CONCLUSIONS: We conclude with open questions and potential future research directions that can be pursued in smart computational docking; using molecular features of different nature (geometrical, energy terms, pharmacophore), advanced ML techniques (e.g., deep learning), combining more than one ML models.

Machine learning assisted ligand binding energy prediction for in silico generated glycosyl hydrolase enzyme combinatorial mutant library

Prediction of the Enantioselectivity of Lipases and Esterases by Molecular Docking Method with Modified Force Field Parameters.

Quantitative prediction of enantioselectivity of Candida antarctica lipase B by combining docking simulations and quantitative structure-activity relationship (QSAR) analysis

Accelerating Molecular Docking using Machine Learning Methods

A Computational Approach to Enzyme Design: Predicting ω-Aminotransferase Catalytic Activity Using Docking and MM-GBSA Scoring

Active Learning Guided Drug Design Lead Optimization Based on Relative Binding Free Energy Modeling

Multiple ligand simultaneous docking: Orchestrated dancing of ligands in binding sites of protein

A Combination of Computational and Experimental Approaches to Investigate the Binding Behavior of B.sub Lipase A Mutants with Substrate Pnpp

Predicting the mutation effects of protein-ligand interactions via end-point binding free energy calculations: strategies and analyses

In silico binding affinity prediction for metabotropic glutamate receptors using both endpoint free energy methods and a machine learning-based scoring function

Computing Ligand Binding Free Energy in a Large Flexible Pocket of a Large Protein

Docking Challenge: Protein Sampling and Molecular Docking Performance

Accurate Prediction of GPCR Ligand Binding Affinity with Free Energy Perturbation

Machine learning in computational docking

Automated relative binding free energy calculations from SMILES to ΔΔG

Automated Docking of Peptides and Proteins by Genetic Algorithm

Uni-Mol Docking V2: Towards Realistic and Accurate Binding Pose Prediction

Identifying ligand binding sites and poses using GPU-accelerated Hamiltonian replica exchange molecular dynamics

A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking

Synergistic Application of Molecular Docking and Machine Learning for Improved Binding Pose