Optimizing CNN Using HPC Tools

Shahrin Rahman
2024-03-08
Abstract:This paper optimizes the Convolutional Neural Network (CNN) algorithm using high-performance computing (HPC) technologies. It uses multi-core processors, GPUs, and parallel computing frameworks like OpenMPI and CUDA to speed up CNN model training. The approach improves performance and training time and is superior to alternative strategies. The study demonstrates how HPC technologies can refine the CNN method, resulting in faster and more accurate training of large-scale CNN models.
Distributed; Parallel; and Cluster Computing
What problem does this paper attempt to address?
The main objective of this paper is to optimize the training process of Convolutional Neural Networks (CNNs) to improve their performance and training speed on large-scale datasets. Specifically, the paper accelerates the training of large-scale CNN models by leveraging High-Performance Computing (HPC) technologies, such as multi-core processors, Graphics Processing Units (GPUs), and parallel computing frameworks (e.g., OpenMPI and CUDA). Through this approach, the researchers aim to significantly reduce the training time of CNN models and enhance model performance. Experimental results show that the proposed method achieves notable improvements in both training time and accuracy when evaluated using benchmark datasets, and it has advantages over other existing optimization strategies. Additionally, the paper explores future research directions, including hybrid parallelization, large-scale dataset optimization, and integration with advanced HPC technologies.