Mitigating Exposure Bias in Score-Based Generation of Molecular Conformations

Sijia Wang,Chen Wang,Zhenhao Zhao,Jiqiang Zhang,Weiran Cai
2024-09-21
Abstract:Molecular conformation generation poses a significant challenge in the field of computational chemistry. Recently, Diffusion Probabilistic Models (DPMs) and Score-Based Generative Models (SGMs) are effectively used due to their capacity for generating accurate conformations far beyond conventional physics-based approaches. However, the discrepancy between training and inference rises a critical problem known as the exposure bias. While this issue has been extensively investigated in DPMs, the existence of exposure bias in SGMs and its effective measurement remain unsolved, which hinders the use of compensation methods for SGMs, including ConfGF and Torsional Diffusion as the representatives. In this work, we first propose a method for measuring exposure bias in SGMs used for molecular conformation generation, which confirms the significant existence of exposure bias in these models and measures its value. We design a new compensation algorithm Input Perturbation (IP), which is adapted from a method originally designed for DPMs only. Experimental results show that by introducing IP, SGM-based molecular conformation models can significantly improve both the accuracy and diversity of the generated conformations. Especially by using the IP-enhanced Torsional Diffusion model, we achieve new state-of-the-art performance on the GEOM-Drugs dataset and are on par on GEOM-QM9. We provide the code publicly at <a class="link-external link-https" href="https://github.com/jia-975/torsionalDiff-ip" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Biomolecules
What problem does this paper attempt to address?
This paper aims to solve the exposure bias problem in molecular conformation generation. Specifically: 1. **Background and Challenges**: - Molecular conformation generation is an important challenge in computational chemistry. Although traditional physics - based methods can generate relatively accurate conformations, they are less efficient, especially when dealing with large molecules. - In recent years, diffusion probability models (DPMs) and score - based generation models (SGMs) have performed well in generating accurate molecular conformations. However, these models have inconsistent inputs during the training and inference stages, resulting in exposure bias. 2. **Deficiencies in Existing Research**: - Although exposure bias has been widely studied in DPMs and some effective mitigation methods have been proposed, whether exposure bias exists in SGMs and how to effectively measure this bias still lack exploration. - This problem hinders the application of compensation methods in SGMs, such as important models like ConfGF and Torsional Diffusion. 3. **Contributions of the Paper**: - **Measurement Method**: The paper proposes a method to measure the exposure bias in SGMs, and confirms that this bias significantly exists in representative SGMs models (such as ConfGF), and can estimate its value. - **Compensation Algorithm**: The paper adapts the input perturbation (IP) algorithm originally designed for DPMs to SGMs. Experimental results show that by introducing IP, the molecular conformations generated by SGMs have significant improvements in both accuracy and diversity. - **Performance Improvement**: In particular, the Torsional Diffusion model enhanced with IP has achieved new state - of - the - art performance on the GEOM - Drugs dataset, and has also achieved performance comparable to the state - of - the - art level on the GEOM - QM9 dataset. 4. **Experimental Verification**: - The paper has carried out a large number of experiments on the GEOM - QM9 and GEOM - Drugs datasets to verify the effectiveness of the IP method. - The experimental results show that after using the IP method, the ConfGF and Torsional Diffusion models have significant improvements in both the accuracy and diversity of the generated molecular conformations. In conclusion, this paper fills the gap in this field by proposing a method to measure and mitigate the exposure bias in SGMs, and significantly improves the performance of molecular conformation generation.