GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs

Ce Yu,Bingsheng He,Shanjian Tang,Ji-zhou Sun,Hao Fu
DOI: https://doi.org/10.1145/3225058.3225077
2018-08-13
Abstract:In this paper, we propose a network-agnostic and convergence-invariant light-weight parallelization framework, namely GLP4NN, to accelerate the training of Deep Neural Networks (DNNs) by taking advantage of emerging GPU features, especially concurrent kernel execution. To determine the number of concurrent kernels on the fly, we design an analytical model in the kernel analyzer module and integrate a compact asynchronous resource tracker in the resource tracker module for collecting runtime configurations of kernels with low memory and time overheads. We further develop a runtime scheduler module and a pool-based stream manager for handling GPU work queues in GLP4NN to avoid consuming too many CPU threads or processes while dispatching workloads to GPU devices. In our experiments, we integrate GLP4NN into Caffe to accelerate the batch-based training of four well-known networks on NVIDIA GPUs. Experimental results show GLP4NN is able to achieve a speedup of up to 4X over the original implementation as well as keep the convergence property of networks.
Computer Science,Engineering
What problem does this paper attempt to address?