Machine learning assisted ligand binding energy prediction for in silico generated glycosyl hydrolase enzyme combinatorial mutant library

S. Chundawat,Igor Guranovic,Mohit Kumar,C. K. Bandi
DOI: https://doi.org/10.1101/2022.11.29.518414
2022-12-02
bioRxiv
Abstract:Molecular docking is a computational method used to predict the preferred binding orientation of one molecule to another when bound to each other to form an energetically stable complex. This approach has been widely used for early-stage small-molecule drug design as well as identifying suitable protein-based macromolecule residues for mutagenesis. Estimating binding free energy, based on docking interactions of protein to its ligand based on an appropriate scoring function is often critical for protein mutagenesis studies to improve the activity or alter the specificity of targeted enzymes. However, calculating docking free energy for a large number of protein mutants is computationally challenging and time-consuming. Here, we showcase an end-to-end computational workflow for predicting the binding energy of pNP-Xylose substrate docked within the substrate binding site for a large library of combinatorial mutants of an alpha-L-fucosidase (TmAfc, PDB ID-2ZWY) belonging to Thermotoga maritima glycosyl hydrolase (GH) family 29. Briefly, in silico combinatorial mutagenesis was performed for the top conserved residues in TmAfc as determined by running multiple sequence alignment against all GH29 family enzyme sequences downloaded from an in-house developed Carbohydrate-Active enZyme (CAZy) database retriever program. The binding energy was calculated through Autodock Vina with pNP-Xylose ligand docking with energy minimized TmAfc mutants, and the data was then used to train a neural network model which was also validated for model predictions using data from Autodock Vina. The current workflow can be adopted for any family of CAZymes to rapidly identify the effect of different mutations within the active site on substrate binding free energy to identify suitable targets for mutagenesis. We anticipate that this workflow could also serve as the starting point for performing more sophisticated and computationally intensive binding free energy calculations to identify targets for mutagenesis and hence optimize use of wet lab resources.
Computer Science,Biology,Chemistry
What problem does this paper attempt to address?