A Deep Learning Frame on Embedded Multicore Processors Based on Caffe and Its Parallel Implementation

Rong GAO,Liang ZHANG,Kuizhi MEI
DOI: https://doi.org/10.7652/xjtuxb201806006
2018-01-01
Abstract:An effective embedded homogeneous and heterogeneous parallel improvement design on the basis of Caffe is proposed to solve the poor compatibility and low efficiency of forward inference in Android mobile terminals by using the open-sourced deep learning frame named Caffe (Convolutional architecture for fast feature embedding).The scheme transplants Caffe and its third-party library to arm architecture using a cross compiler,and then the multi-core and multithread technology is used to parallelize partial forward inference between convolution layer and input frame group.An heterogeneous parallel convolutional implementation based on OpenCL is also presented to further improve the time performance of the scheme.Comparison tests with three classic deep learning neural networks MNIST,Cifar-10 and CaffeNet show that in the absence of any model precision loss,the time consuming after parallelization is far less than that before parallel,and time performance increases up to 2 times.It is concluded that the proposal can make the deep learning frame Caffe effectively deploy and work in parallel on portable embedded multicore devices.
What problem does this paper attempt to address?