Abstract:......................................................................................................................... 3 1 Introduction ............................................................................................................. 3 1.1 Application Background ............................................................................... 4 1.2 Performance Demands for Deep Learning ................................................... 4 1.3 Existing Parallel Frameworks of Deep Learning ......................................... 4 1.4 Chapter Organization ................................................................................... 5 2 Concepts and Categories of Deep Learning ............................................................ 5 2.1 Deep Learning ............................................................................................. 5 2.1.1 Artificial Neural Networks ................................................................ 5 2.1.2 Concept of Deep Learning ................................................................. 7 2.2 Mainstream Deep Learning Models ............................................................. 8 2.2.1 Autoencoders ..................................................................................... 8 2.2.2 Back Propagation ............................................................................... 9 2.2.3 Convolutional Neural Network ........................................................ 11 3 Parallel Optimization for Deep Learning .............................................................. 13 3.1 Convolutional Architecture for Fast Feature Embedding ......................... 13 3.1.1 Introduction ...................................................................................... 13 3.1.2 CUDA Programming ....................................................................... 14 3.1.3 Architecture of Caffe ....................................................................... 17 3.1.4 Parallel Implementation of Convolution in Caffe ............................ 18 3.2 DistBelief .................................................................................................. 20 3.2.1 Introduction of DistBelief ................................................................ 20 3.2.2 Downpour SGD ............................................................................... 20 3.2.4 Sandblaster L-BFGS ........................................................................ 21 3.3 Deep Learning Based-on Multi-GPUs ...................................................... 22 3.3.1 Data Parallelism ............................................................................... 22 3.3.2 Model Parallelism ............................................................................ 23 3.3.3 Data-Model Parallelism ................................................................... 24 3.3.4 Example System of Multi-GPUs ..................................................... 25 4 Discussions ........................................................................................................... 26 4.1 Grand Challenges of Deep Learning with Big Data ................................. 26 4.1.1 Massive Amounts of Training Sample ............................................ 26 4.1.2 Incremental Streaming Data ............................................................ 26 4.1.3 Learning Speed with Big Data ......................................................... 26 4.1.4 Scalability of Deep Models .............................................................. 27 4.2 Future Work .............................................................................................. 27 References .................................................................................................................... 28 Deep Learning and Its Parallelization: Concepts and Instances Xiaqing Li, Guangyan Zhang, Keqin Li, and Weimin Zheng Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis

Coded Parallelism for Distributed Deep Learning.

A Linear Algebraic Approach to Model Parallelism in Deep Learning

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

A Survey on Auto-Parallelism of Large-Scale Deep Learning Training

Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey

A Survey From Distributed Machine Learning to Distributed Deep Learning

Deep Learning and Its Parallelization

A Quick Survey on Large Scale Distributed Deep Learning Systems.

Model Parallelism on Distributed Infrastructure: A Literature Review from Theory to LLM Case-Studies

Communication-Efficient Distributed Deep Learning: Survey, Evaluation, and Challenges.

Brief Announcement: On the Limits of Parallelizing Convolutional Neural Networks on GPUs

Integrated Model, Batch and Domain Parallelism in Training Neural Networks

Communication Patterns in Distributed Deep Learning

DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING

Accelerating neural network training with distributed asynchronous and selective optimization (DASO)

A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings

Communication optimization strategies for distributed deep neural network training: A survey

Model-Aware Parallelization Strategy for Deep Neural Networks' Distributed Training

Distributed machine learning: Foundations, trends, and practices