Abstract:......................................................................................................................... 3 1 Introduction ............................................................................................................. 3 1.1 Application Background ............................................................................... 4 1.2 Performance Demands for Deep Learning ................................................... 4 1.3 Existing Parallel Frameworks of Deep Learning ......................................... 4 1.4 Chapter Organization ................................................................................... 5 2 Concepts and Categories of Deep Learning ............................................................ 5 2.1 Deep Learning ............................................................................................. 5 2.1.1 Artificial Neural Networks ................................................................ 5 2.1.2 Concept of Deep Learning ................................................................. 7 2.2 Mainstream Deep Learning Models ............................................................. 8 2.2.1 Autoencoders ..................................................................................... 8 2.2.2 Back Propagation ............................................................................... 9 2.2.3 Convolutional Neural Network ........................................................ 11 3 Parallel Optimization for Deep Learning .............................................................. 13 3.1 Convolutional Architecture for Fast Feature Embedding ......................... 13 3.1.1 Introduction ...................................................................................... 13 3.1.2 CUDA Programming ....................................................................... 14 3.1.3 Architecture of Caffe ....................................................................... 17 3.1.4 Parallel Implementation of Convolution in Caffe ............................ 18 3.2 DistBelief .................................................................................................. 20 3.2.1 Introduction of DistBelief ................................................................ 20 3.2.2 Downpour SGD ............................................................................... 20 3.2.4 Sandblaster L-BFGS ........................................................................ 21 3.3 Deep Learning Based-on Multi-GPUs ...................................................... 22 3.3.1 Data Parallelism ............................................................................... 22 3.3.2 Model Parallelism ............................................................................ 23 3.3.3 Data-Model Parallelism ................................................................... 24 3.3.4 Example System of Multi-GPUs ..................................................... 25 4 Discussions ........................................................................................................... 26 4.1 Grand Challenges of Deep Learning with Big Data ................................. 26 4.1.1 Massive Amounts of Training Sample ............................................ 26 4.1.2 Incremental Streaming Data ............................................................ 26 4.1.3 Learning Speed with Big Data ......................................................... 26 4.1.4 Scalability of Deep Models .............................................................. 27 4.2 Future Work .............................................................................................. 27 References .................................................................................................................... 28 Deep Learning and Its Parallelization: Concepts and Instances Xiaqing Li, Guangyan Zhang, Keqin Li, and Weimin Zheng Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Exploiting Potential of Deep Neural Networks by Layer-Wise Fine-Grained Parallelism

FiLayer: A Novel Fine-Grained Layer-Wise Parallelism Strategy for Deep Neural Networks

Layer-Wise Partitioning and Merging for Efficient and Scalable Deep Learning

Coded Parallelism for Distributed Deep Learning.

Brief Announcement: On the Limits of Parallelizing Convolutional Neural Networks on GPUs

Deep Learning and Its Parallelization

Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy

A Deep Learning Frame on Embedded Multicore Processors Based on Caffe and Its Parallel Implementation

Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform

Exploiting Parallelism Opportunities with Deep Learning Frameworks

Scalable Heterogeneous Scheduling Based Model Parallelism for Real-Time Inference of Large-Scale Deep Neural Networks

GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs

Analyzing the Performance of Graph Neural Networks with Pipe Parallelism

Beyond Data and Model Parallelism for Deep Neural Networks

Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis

Aware: Adaptive Distributed Training with Computation, Communication and Position Awareness for Deep Learning Model.

A Bi-layered Parallel Training Architecture for Large-scale Convolutional Neural Networks

PipePar: A Pipelined Hybrid Parallel Approach for Accelerating Distributed DNN Training

SingleCaffe: an Efficient Framework for Deep Learning on a Single Node

Model Parallelism Optimization for Distributed Inference Via Decoupled CNN Structure

GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism