DeltaGzip: Computing Biopolymer-Ligand Binding Affinity via Kolmogorov Complexity and Lossless Compression

Lena Simine,Tao Liu
DOI: https://doi.org/10.26434/chemrxiv-2024-cq702
2024-03-06
Abstract:The design of bio-sequences for biosensing and therapeutics is a challenging multi-step search and optimization task. In principle, computational modeling may speed up the design process by virtual screening of sequences based on their binding affinities to target molecules. However, in practice, existing machine-learned models trained to predict binding affinities lack the flexibility with respect to reaction conditions, and molecular dynamics simulations that can incorporate reaction conditions suffer from high computational costs. Here, we describe a computational approach called DeltaGzip that evaluates the free energy of binding in biopolymer-ligand complexes from ultra-short equilibrium molecular dynamics simulations. The entropy of binding is evaluated using the Kolmogorov complexity definition of entropy and approximated using a lossless compression algorithm, Gzip. We benchmark the method on a well-studied dataset of protein-ligand complexes comparing the predictions of DeltaGzip to the free energies of binding obtained using the Jarzynski equality and experimental measurements.
Chemistry
What problem does this paper attempt to address?
This paper attempts to address the problem of rapidly and accurately calculating binding free energy in biopolymer-ligand complexes. Specifically, existing methods have the following issues when predicting binding affinity: 1. **Limitations of machine learning models**: Although existing machine learning models can predict binding affinity, they lack flexibility when reaction conditions change, leading to unreliable prediction results. 2. **High cost of molecular dynamics simulations**: While molecular dynamics (MD) simulations can take reaction conditions into account, the computational cost is very high, making them unsuitable for high-throughput screening. To solve these problems, the authors propose a new method called DeltaGzip, which evaluates binding free energy through ultra-short equilibrium molecular dynamics simulations and lossless compression algorithms (such as Gzip). This method not only has low computational cost but also can flexibly consider different reaction conditions, making it suitable for high-throughput screening of strong binders in biopolymer-ligand complexes.