Euclidean Fast Attention: Machine Learning Global Atomic Representations at Linear Cost

J. Thorben Frank,Stefan Chmiela,Klaus-Robert Müller,Oliver T. Unke
2024-12-12
Abstract:Long-range correlations are essential across numerous machine learning tasks, especially for data embedded in Euclidean space, where the relative positions and orientations of distant components are often critical for accurate predictions. Self-attention offers a compelling mechanism for capturing these global effects, but its quadratic complexity presents a significant practical limitation. This problem is particularly pronounced in computational chemistry, where the stringent efficiency requirements of machine learning force fields (MLFFs) often preclude accurately modeling long-range interactions. To address this, we introduce Euclidean fast attention (EFA), a linear-scaling attention-like mechanism designed for Euclidean data, which can be easily incorporated into existing model architectures. A core component of EFA are novel Euclidean rotary positional encodings (ERoPE), which enable efficient encoding of spatial information while respecting essential physical symmetries. We empirically demonstrate that EFA effectively captures diverse long-range effects, enabling EFA-equipped MLFFs to describe challenging chemical interactions for which conventional MLFFs yield incorrect results.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: the challenge of accurately modeling long - range interactions in machine - learning force fields (MLFFs), especially in the field of computational chemistry. Specifically, the article points out that although the self - attention mechanism can capture global effects, its quadratic complexity (that is, the time complexity and memory complexity are in a square relationship with the number of inputs) restricts its wide application in MLFFs. In addition, for data embedded in Euclidean space (such as atomic positions in molecular dynamics simulations), linearly - scaled attention mechanisms are difficult to effectively encode spatial information and maintain physical symmetry. To solve these problems, the authors propose the **Euclidean Fast Attention (EFA)** mechanism. EFA is a linearly - scaled attention mechanism, specifically designed for data in Euclidean space, which can efficiently encode spatial information and respect physical symmetry (such as translational and rotational invariance). By introducing a new **Euclidean Rotary Positional Encodings (ERoPE)**, EFA can accurately capture long - range effects while maintaining linear complexity. ### Specific Problems and Solutions 1. **Modeling of Long - Range Interactions**: - **Problem**: Traditional message - passing neural networks (MPNNs) cannot effectively model long - range interactions beyond the cutoff distance due to the existence of local cutoff. - **Solution**: EFA can faithfully capture long - range effects by directly accessing all nodes, regardless of distance. 2. **High - Complexity Problem of the Self - Attention Mechanism**: - **Problem**: The standard self - attention mechanism has quadratic complexity, which will lead to excessive computational costs when dealing with large - scale systems. - **Solution**: EFA redefines the similarity kernel function and combines it with ERoPE to achieve a linearly - complex attention mechanism. 3. **Encoding of Geometric Information in Euclidean Space**: - **Problem**: Linearly - scaled attention mechanisms are difficult to effectively encode spatial information while maintaining physical symmetry. - **Solution**: ERoPE ensures rotational invariance by projecting displacement vectors onto unit vectors, using complex exponential expansion, and then averaging over all possible unit vectors. ### Experimental Verification The article verifies the effectiveness of EFA through experiments on idealized model systems and actual chemical systems. The experimental results show that the EFA - enhanced model can accurately describe various long - range interactions and non - local effects, while traditional MLFFs cannot correctly capture these effects. For example, when dealing with the long - range potential energy between a pair of particles, EFA can accurately describe the energy change over the entire interaction length, while the standard MPNN predicts an incorrect constant energy curve beyond the local cutoff distance. In conclusion, this paper proposes a new linearly - scaled attention mechanism, EFA, which solves the complexity and geometric information encoding problems encountered by traditional methods in modeling long - range interactions, and significantly improves computational efficiency and accuracy.