BaGuaLu

Zixuan Ma,Jiaao He,Jiezhong Qiu,Huanqi Cao,Yuanwei Wang,Zhenbo Sun,Liyan Zheng,Haojie Wang,Shizhi Tang,Tianyu Zheng,Junyang Lin,Guanyu Feng,Zeqiang Huang,Jie Gao,Aohan Zeng,Jianwei Zhang,Runxin Zhong,Tianhui Shi,Sha Liu,Wei Xing Zheng,Jie Tang,Hongxia Yang,Xin Liu,Jidong Zhai,Wenguang Chen
DOI: https://doi.org/10.1145/3503221.3508417
2022-01-01
Abstract:Large-scale pretrained AI models have shown state-of-the-art accuracy in a series of important applications. As the size of pretrained AI models grows dramatically each year in an effort to achieve higher accuracy, training such models requires massive computing and memory capabilities, which accelerates the convergence of AI and HPC. However, there are still gaps in deploying AI applications on HPC systems, which need application and system co-design based on specific hardware features.
What problem does this paper attempt to address?