Accurate Lattice Free Energies of Packing Polymorphs from Probabilistic Generative Models

Matteo Salvalaglio,Edgar Olehnovics,Michael Shirts,Yifei Michelle Liu,Nada Mehio,Ahmad Sheikh
DOI: https://doi.org/10.26434/chemrxiv-2024-1lm95
2024-10-23
Abstract:Finite-temperature lattice free energy differences between polymorphs of molecular crystals are fundamental to understanding and predicting the relative stability relationships underpinning polymorphism, yet are computationally expensive to obtain. Here, we implement and critically assess machine-learning-enabled targeted free energy calculations derived from flow-based generative models to compute the free energy difference between two ice crystal polymorphs (Ice XI and Ic), modelled with a fully flexible empirical classical force field. We demonstrate that even when remapping through an analytical reference distribution, such methods enable a cost-effective and accurate calculation of free energy differences between disconnected metastable ensembles when trained on locally ergodic data sampled exclusively from the ensembles of interest. Unlike classical free energy perturbation methods, such as the Einstein crystal method, the targeted approach analysed in this work requires no additional sampling of intermediate perturbed Hamiltonians, offering significant computational savings in the system sizes compared in this work. To systematically assess the accuracy of the method, we monitored the convergence of free energy estimates during training by implementing an overfitting-aware weighted averaging strategy. By comparing our results with ground-truth free energy differences computed with the Einstein crystal method, we assess the accuracy and efficiency of two different model architectures, employing two different representations of the supercells degrees of freedom (Cartesian vs. quaternion-based). We conduct our assessment by comparing free energy differences between crystal supercells of different sizes and temperatures and assessing the accuracy in extrapolating lattice free energies to the thermodynamic limit. While at low temperatures and in small system sizes, the models perform with similar accuracy, we note that for larger systems and high temperatures, the choice of representation is key to obtaining generalisable results of quality comparable to that obtained from the Einstein crystal method. We hope the current work to be a useful stepping stone towards efficient free energy calculations in larger, more conformationally flexible systems.
Chemistry
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to calculate the lattice free - energy differences at finite temperatures among different polymorphs of molecular crystals. This problem is crucial for understanding the relative stabilities among polymorphs, especially in the chemical, pharmaceutical, semiconductor and food industries. However, traditional computational methods are not only computationally expensive but also difficult to be applied on a large scale. This paper proposes an approach based on a machine - learning - based probabilistic generative model (PGM) aiming to calculate the free - energy differences among different ice polymorphs (such as Ice XI and Ic) in a more efficient and accurate manner. ### Specific Problems and Solutions 1. **Problems**: - **High computational cost**: Traditional methods such as the Einstein crystal method require a large amount of computational resources to sample intermediate states, which is especially evident when dealing with large - scale systems. - **Balance between accuracy and efficiency**: Existing methods either have low computational cost but insufficient accuracy or high accuracy but excessive computational cost. 2. **Solutions**: - **Probabilistic generative model (PGM)**: Utilize flow - based models to directly map the probability distributions between two different metastable sets, thus avoiding direct sampling of intermediate states. This method not only reduces the computational cost but also maintains high accuracy. - **Training data**: The model is trained only with local traversal data from the target set, ensuring the computational efficiency and accuracy. - **Evaluation method**: The accuracy and efficiency of two different model architectures (based on Cartesian coordinates and quaternion representations) are evaluated by comparing with the benchmark free - energy differences calculated by the Einstein crystal method. ### Main Contributions - **Computational efficiency**: The proposed PGM method significantly reduces the computational cost, especially when dealing with larger systems. - **Accuracy**: Under different temperatures and system sizes, this method can provide accuracy comparable to or better than that of classical methods. - **Generality**: The research results show that choosing an appropriate representation method (such as quaternion representation) is crucial for obtaining high - quality results, especially in high - temperature and large - scale systems. ### Conclusion This paper provides a new method for efficiently calculating the free - energy differences among molecular crystal polymorphs and is expected to be widely applied in larger and more complex systems. By comparing different model architectures and representation methods, the authors show how to improve computational efficiency while ensuring accuracy.