Performance evaluation of convolutional neural network on Tianhe-3 prototype

Weiduo Chen,Xiaoshe Dong,Heng Chen,Qiang Wang,Xingda Yu,Xingjun Zhang
DOI: https://doi.org/10.1007/s11227-021-03759-8
IF: 3.3
2021-04-12
The Journal of Supercomputing
Abstract:Exascale supercomputers will greatly support the expanding computational resource demand of convolutional neural networks (CNNs). At present, the prototype cluster of Tianhe-3 supercomputer, which is based on the Chinese-made many-core processors, the Phytium-2000+ (FTP) and Matrix-2000+ (MTP), has gone into service. We evaluated the training performance of CNN on the Tianhe-3 prototype. The performance of image convolution and matrix multiplication on the FTP and MTP was tested to evaluate the single-node performance, and the Allreduce element was tested to evaluate the scalability of the distributed training on the prototype cluster. We also qualitatively analyzed the performance bottlenecks of CNN on the FTP and MTP processors by Roofline model and provided some optimization suggestions for improving the CNN on the Tianhe-3 prototype.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?