A Comprehensive Evaluation of Novel AI Accelerators for Deep Learning Workloads
Murali Emani,Zhen Xie,Siddhisanket Raskar,Varuni Sastry,William Arnold,Bruce Wilson,Rajeev Thakur,Venkatram Vishwanath,Zhengchun Liu,Michael E. Papka,Cindy Orozco Bohorquez,Rick Weisner,Karen Li,Yongning Sheng,Yun Du,Jian Zhang,Alexander Tsyplikhin,Gurdaman Khaira,Jeremy Fowers,Ramakrishnan Sivakumar,Victoria Godsoe,Adrian Macias,Chetan Tekur,Matthew Boyd
DOI: https://doi.org/10.1109/pmbs56514.2022.00007
2022-01-01
Abstract:Scientific applications are increasingly adopting Artificial Intelligence (AI) techniques to advance science. Highperformance computing centers are evaluating emerging novel hardware accelerators to efficiently run AI-driven science applications. With a wide diversity in the hardware architectures and software stacks of these systems, it is challenging to understand how these accelerators perform. The state-of-the-art in the evaluation of deep learning workloads primarily focuses on CPUs and GPUs. In this paper, we present an overview of dataflow-based novel AI accelerators from SambaNova, Cerebras, Graphcore, and Groq. We present a first-of-a-kind evaluation of these accelerators with diverse workloads, such as Deep Learning (DL) primitives, benchmark models, and scientific machine learning applications. We also evaluate the performance of collective communication, which is key for distributed DL implementation, along with a study of scaling efficiency. We then discuss key insights, challenges, and opportunities in integrating these novel AI accelerators in supercomputing systems.