Convergence of Artificial Intelligence and High Performance Computing on NSF-supported Cyberinfrastructure

E. A. Huerta,Asad Khan,Edward Davis,Colleen Bushell,William D. Gropp,Daniel S. Katz,Volodymyr Kindratenko,Seid Koric,William T. C. Kramer,Brendan McGinty,Kenton McHenry,Aaron Saxton
DOI: https://doi.org/10.1186/s40537-020-00361-2
2020-10-20
Abstract:Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and engineering breakthroughs in the big data era. Innovative Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology that now drive a multi-billion dollar industry, and which play an ever increasing role shaping human social patterns. As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single-GPU solutions for training, validation, and testing are no longer sufficient for computational grand challenges brought about by scientific facilities that produce data at a rate and volume that outstrip the computing capabilities of available cyberinfrastructure platforms. This realization has been driving the confluence of AI and high performance computing (HPC) to reduce time-to-insight, and to enable a systematic study of domain-inspired AI architectures and optimization schemes to enable data-driven discovery. In this article we present a summary of recent developments in this field, and describe specific advances that authors in this article are spearheading to accelerate and streamline the use of HPC platforms to design and apply accelerated AI algorithms in academia and industry.
Computational Physics,Instrumentation and Methods for Astrophysics,Machine Learning,General Relativity and Quantum Cosmology
What problem does this paper attempt to address?
The paper primarily explores the integration of Artificial Intelligence (AI) and High-Performance Computing (HPC), aiming to address the following key issues: 1. **Accelerating AI Model Training**: With the rapid increase in data generated by scientific research facilities, single GPU solutions are no longer sufficient to meet the demands of large-scale data processing. Therefore, researchers are committed to combining AI and HPC to reduce the time from data to insights and to develop systematic approaches to study domain-inspired AI architectures and optimization schemes for data-driven discovery. 2. **Building a Rigorous Mathematical Framework**: To accelerate the training of AI models on HPC platforms, a rigorous mathematical framework is needed to guide the selection of domain-inspired AI architectures and optimization schemes, ensuring that AI models can quickly converge and achieve optimal performance. 3. **Interdisciplinary Collaboration**: The paper emphasizes the importance of establishing an interdisciplinary collaboration mechanism that can bring together domain experts, information scientists, AI experts, data specialists, and software developers to jointly participate in the collection and organization of experimental and simulation datasets. 4. **Promoting the Commercialization of AI Tools**: The paper points out the need to identify the connections between AI data and models across different fields, which helps in producing commercial software that can be seamlessly applied to various domains. 5. **Deployment of Open Source Platforms**: To accelerate the commercialization of reproducible and reliable AI tools, AI models and data need to be deployed on open-source platforms, such as the Data and Learning Hub for Science. The paper also discusses the current challenges, including how to make experimental data more suitable for data-driven discovery, how to design AI models that can converge faster and support intuitive discoveries, and more. Additionally, it mentions related projects promoted by the National Science Foundation (NSF) and the Department of Energy (DOE) in the United States, and how these projects facilitate the development of next-generation HPC platforms and accelerate the design, deployment, and adoption of innovative AI applications.