Scaling physics-informed hard constraints with mixture-of-experts

Nithin Chalapathi,Yiheng Du,Aditi Krishnapriyan
2024-02-21
Abstract:Imposing known physical constraints, such as conservation laws, during neural network training introduces an inductive bias that can improve accuracy, reliability, convergence, and data efficiency for modeling physical dynamics. While such constraints can be softly imposed via loss function penalties, recent advancements in differentiable physics and optimization improve performance by incorporating PDE-constrained optimization as individual layers in neural networks. This enables a stricter adherence to physical constraints. However, imposing hard constraints significantly increases computational and memory costs, especially for complex dynamical systems. This is because it requires solving an optimization problem over a large number of points in a mesh, representing spatial and temporal discretizations, which greatly increases the complexity of the constraint. To address this challenge, we develop a scalable approach to enforce hard physical constraints using Mixture-of-Experts (MoE), which can be used with any neural network architecture. Our approach imposes the constraint over smaller decomposed domains, each of which is solved by an "expert" through differentiable optimization. During training, each expert independently performs a localized backpropagation step by leveraging the implicit function theorem; the independence of each expert allows for parallelization across multiple GPUs. Compared to standard differentiable optimization, our scalable approach achieves greater accuracy in the neural PDE solver setting for predicting the dynamics of challenging non-linear systems. We also improve training stability and require significantly less computation time during both training and inference stages.
Machine Learning,Artificial Intelligence,Numerical Analysis,Optimization and Control
What problem does this paper attempt to address?
This paper aims to address the computational and memory costs associated with strictly enforcing physical constraints (such as conservation laws) during neural network training, especially when dealing with complex dynamical systems. Existing methods impose these constraints in a soft manner through loss function penalties, but this may lead to optimization difficulties and convergence issues, and cannot guarantee constraint enforcement during inference. The paper proposes a scalable approach using Mixture-of-Experts (MoE) to enforce hard physical constraints. This approach applies the constraints to smaller decomposition domains, each solved independently by an "expert" through differentiable optimization. Each expert performs local backpropagation, leveraging the implicit function theorem for parallel computing, which improves training stability and computational efficiency. During training, each expert independently optimizes within a local region to apply known physical priors within their respective domains. This enables parallelization of forward and backward propagation across multiple GPUs, reducing computation time and improving training stability. Compared to standard differentiable optimization methods, this approach demonstrates higher accuracy in predicting the dynamics of challenging nonlinear systems and significantly reduces the required computation time in both training and inference stages. The main contributions of the paper include: 1. Introducing a physics-inspired Mixture-of-Experts training framework (PI-HC-MoE) to impose hard physical constraints on neural networks by solving constrained optimization problems, achieving scalability. 2. Instantiating this method in a neural PDE solver setting, demonstrating its application on two challenging nonlinear problems (diffusion absorption and turbulent Navier-Stokes equations), where it significantly improves accuracy compared to soft constraints and standard hard-constraint differentiable optimization methods. 3. Showing sub-linear scaling in execution time for PI-HC-MoE compared to standard differentiable optimization, with improved efficiency as the number of sampling points increases in the spatio-temporal domain. 4. Providing open-source code to promote replicability and further research. Overall, the paper proposes an effective method to address the computational burden of enforcing hard physical constraints in complex systems by decomposing constraint enforcement into parallel tasks, improving the accuracy and efficiency of neural networks in simulating physical phenomena.