Parallelizing Convolutional Neural Networks On Intel (R) Many Integrated Core Architecture

Junjie Liu,Haixia Wang,Dongsheng Wang,Yuan Gao,Zuofeng Li
DOI: https://doi.org/10.1007/978-3-319-16086-3_6
2015-01-01
Abstract:Convolutional neural networks (CNNs) are state-of-the-art machine learning algorithm in low-resolution vision tasks and are widely applied in many applications. However, the training process of them is very time-consuming. As a result, many approaches have been proposed in which parallelization is one of the most effective. In this article, we parallelized a classic CNN on a new platform of Intel (R) Xeon Phi (TM) Coprocessor with OpenMP. Our implementation acquired 131x speedup against the serial version running on the coprocessor itself and 8.3x speedup against the serial baseline on the Xeon (R) E5-2697 CPU.
What problem does this paper attempt to address?