Abstract:......................................................................................................................... 3 1 Introduction ............................................................................................................. 3 1.1 Application Background ............................................................................... 4 1.2 Performance Demands for Deep Learning ................................................... 4 1.3 Existing Parallel Frameworks of Deep Learning ......................................... 4 1.4 Chapter Organization ................................................................................... 5 2 Concepts and Categories of Deep Learning ............................................................ 5 2.1 Deep Learning ............................................................................................. 5 2.1.1 Artificial Neural Networks ................................................................ 5 2.1.2 Concept of Deep Learning ................................................................. 7 2.2 Mainstream Deep Learning Models ............................................................. 8 2.2.1 Autoencoders ..................................................................................... 8 2.2.2 Back Propagation ............................................................................... 9 2.2.3 Convolutional Neural Network ........................................................ 11 3 Parallel Optimization for Deep Learning .............................................................. 13 3.1 Convolutional Architecture for Fast Feature Embedding ......................... 13 3.1.1 Introduction ...................................................................................... 13 3.1.2 CUDA Programming ....................................................................... 14 3.1.3 Architecture of Caffe ....................................................................... 17 3.1.4 Parallel Implementation of Convolution in Caffe ............................ 18 3.2 DistBelief .................................................................................................. 20 3.2.1 Introduction of DistBelief ................................................................ 20 3.2.2 Downpour SGD ............................................................................... 20 3.2.4 Sandblaster L-BFGS ........................................................................ 21 3.3 Deep Learning Based-on Multi-GPUs ...................................................... 22 3.3.1 Data Parallelism ............................................................................... 22 3.3.2 Model Parallelism ............................................................................ 23 3.3.3 Data-Model Parallelism ................................................................... 24 3.3.4 Example System of Multi-GPUs ..................................................... 25 4 Discussions ........................................................................................................... 26 4.1 Grand Challenges of Deep Learning with Big Data ................................. 26 4.1.1 Massive Amounts of Training Sample ............................................ 26 4.1.2 Incremental Streaming Data ............................................................ 26 4.1.3 Learning Speed with Big Data ......................................................... 26 4.1.4 Scalability of Deep Models .............................................................. 27 4.2 Future Work .............................................................................................. 27 References .................................................................................................................... 28 Deep Learning and Its Parallelization: Concepts and Instances Xiaqing Li, Guangyan Zhang, Keqin Li, and Weimin Zheng Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Deep Learning and Its Parallelization

Coded Parallelism for Distributed Deep Learning.

DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING

Parallel Learning - A New Framework for Machine Learning

A Survey on Auto-Parallelism of Large-Scale Deep Learning Training

Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview

Deep Learning and Machine Learning with GPGPU and CUDA: Unlocking the Power of Parallel Computing

A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning

Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey

Exploiting Parallelism Opportunities with Deep Learning Frameworks

Review of Deep Learning Parallelization and Its Application in Spatial Data Mining

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

A Linear Algebraic Approach to Model Parallelism in Deep Learning

Brief Announcement: On the Limits of Parallelizing Convolutional Neural Networks on GPUs

Efficient Distributed Image Recognition Algorithm of Deep Learning Framework TensorFlow

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

Dynamic Universal Approximation Theory: Foundations for Parallelism in Neural Networks

Deep Learning Model And Its Application In Big Data

Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-Core Coprocessor

Beyond Data and Model Parallelism for Deep Neural Networks