Employing Artificial Neural Networks for Optimal Storage and Facile Sharing of Molecular Dynamics Simulation Trajectories

Abdul Wasim,Lars V. Schäfer,Jagannath Mondal
DOI: https://doi.org/10.1101/2024.09.15.613125
2024-09-19
Abstract:With the remarkable stride in computing power and advances in Molecular Dynamics simulation programs, a crucial challenge of storing and sharing large biomolecular simulation datasets has emerged. By leveraging AutoEncoders, a type of artificial neural network, we developed a method to compress MD trajectories into significantly smaller latent spaces. Our method can save upto 98% in disk space compared to XTC, a highly compressed trajectory format from the widely used MD program package GROMACS, thus facilitating easier storage and sharing of simulation trajectories. Atomic coordinates are very accurately reconstructed from compressed data. The method was tested across a variety of biomolecular systems, including folded proteins, intrinsically disordered proteins (IDPs), and protein-ligand complexes, showing consistent accuracy in reconstruction. Notably, the compression efficiency was particularly beneficial for larger systems. This approach enables the scientific community to more efficiently store and share large-scale biomolecular simulation data, potentially enhancing collaborative research efforts. The workflow, termed "compressTraj", is implemented in PyTorch and is publicly available at https://github.com/SerpentByte/compressTraj for use, offering a practical solution for managing the growing volumes of data generated in computational biomolecular studies.
Biophysics
What problem does this paper attempt to address?