Scalable and Consistent Graph Neural Networks for Distributed Mesh-based Data-driven Modeling

Shivam Barwey,Riccardo Balin,Bethany Lusch,Saumil Patel,Ramesh Balakrishnan,Pinaki Pal,Romit Maulik,Venkatram Vishwanath
2024-10-02
Abstract:This work develops a distributed graph neural network (GNN) methodology for mesh-based modeling applications using a consistent neural message passing layer. As the name implies, the focus is on enabling scalable operations that satisfy physical consistency via halo nodes at sub-graph boundaries. Here, consistency refers to the fact that a GNN trained and evaluated on one rank (one large graph) is arithmetically equivalent to evaluations on multiple ranks (a partitioned graph). This concept is demonstrated by interfacing GNNs with NekRS, a GPU-capable exascale CFD solver developed at Argonne National Laboratory. It is shown how the NekRS mesh partitioning can be linked to the distributed GNN training and inference routines, resulting in a scalable mesh-based data-driven modeling workflow. We study the impact of consistency on the scalability of mesh-based GNNs, demonstrating efficient scaling in consistent GNNs for up to O(1B) graph nodes on the Frontier exascale supercomputer.
Distributed, Parallel, and Cluster Computing,Machine Learning,Computational Physics
What problem does this paper attempt to address?