Monitoring the development of CFD applications on unstable HPC platforms

Damien Dosimont,Guillaume Houzeaux
2024-01-16
Abstract:We tackle the challenging tasks of monitoring on unstable HPC platforms the performance of CFD applications all along their development. We have designed and implemented a monitoring framework, integrated at the end of a CI-CD pipeline. Measures retrieved during the automatic execution of production simulations are analyzed within a visual analytics interface we developed, providing advanced visualizations and interaction. We have validated this approach by monitoring the CFD code Alya over two years, detecting and resolving issues related to the platform, and highlighting performance improvement.
Distributed, Parallel, and Cluster Computing,Computational Physics
What problem does this paper attempt to address?