Polyatomic Complexes: A topologically-informed learning representation for atomistic systems

Rahul Khorana,Marcus Noack,Jin Qian
2024-09-26
Abstract:Developing robust representations of chemical structures that enable models to learn topological inductive biases is challenging. In this manuscript, we present a representation of atomistic systems. We begin by proving that our representation satisfies all structural, geometric, efficiency, and generalizability constraints. Afterward, we provide a general algorithm to encode any atomistic system. Finally, we report performance comparable to state-of-the-art methods on numerous tasks. We open-source all code and datasets. The code and data are available at <a class="link-external link-https" href="https://github.com/rahulkhorana/PolyatomicComplexes" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Computational Physics
What problem does this paper attempt to address?
The paper attempts to address the issue that existing methods in chemical structure representation cannot simultaneously satisfy all necessary constraints, such as invariance, uniqueness, continuity, differentiability, generality/generalization ability, computational efficiency, topological accuracy, the ability to consider long-range interactions, and the sufficiency of chemical information. The authors propose a new representation method—Polyatomic Complexes, aiming to solve these problems and theoretically prove that this representation method can meet all the aforementioned constraints. Specifically, the main objectives of the paper include: 1. **Developing a new representation method**: Constructing Polyatomic Complexes through mathematically rigorous methods to overcome the limitations of existing representation methods. 2. **Meeting all necessary constraints**: Ensuring that the new method performs well in terms of invariance, uniqueness, continuity, differentiability, generality/generalization ability, computational efficiency, topological accuracy, the ability to consider long-range interactions, and the sufficiency of chemical information. 3. **Providing theoretical guarantees**: Demonstrating the superiority of Polyatomic Complexes in the aforementioned aspects through strict mathematical proofs. 4. **Validating practical performance**: Experimentally verifying the performance of the new method on multiple tasks, comparing it with existing state-of-the-art methods to showcase its competitiveness. Overall, the paper aims to advance the field of chemical structure representation by proposing a new, comprehensive representation method, particularly in the applications of machine learning and deep learning.