Parallel Computing In Dnns Using Cpu And Mic

Sijiang Fan,Shiqing Zhang,Ximing He,Jiawei Fei,Li Shen,Zhiying Wang
DOI: https://doi.org/10.1109/ISPA/IUCC.2017.00102
2017-01-01
Abstract:Acceleration for the training process of Deep Neural Networks (DNNs) has been the focus of deep learning field. There were many researches of accelerating deep learning on different platforms. Among them, Intel Xeon Phi Co-processor is a many-core platform which provides both strong programmability and high performance. But previous work about Intel Many Integrated Core (MIC) focused on parallel computing only in MIC. In this paper, we speed up the training process of DNNs applied for automatic speech recognition with CPU+MIC architecture. In this architecture, the training process of DNNs is executed both on MIC and CPU. We apply several optimization methods for I/O and calculation and set up experiments to approve these methods. Putting all methods together, results show that our optimized algorithm acquires about 20x speedup compared with the original sequential algorithm on CPU which uses one core.
What problem does this paper attempt to address?