Abstract:Distributed Machine learning has delivered considerable advances in training neural networks by leveraging parallel processing, scalability, and fault tolerance to accelerate the process and improve model performance. However, training of large-size models has exhibited numerous challenges, due to the gradient dependence that conventional approaches integrate. To improve the training efficiency of such models, gradient-free distributed methodologies have emerged fostering the gradient-independent parallel processing and efficient utilization of resources across multiple devices or nodes. However, such approaches, are usually restricted to specific applications, due to their conceptual limitations: computational and communicational requirements between partitions, limited partitioning solely into layers, limited sequential learning between the different layers, as well as training a potential model in solely synchronous mode. In this paper, we propose and evaluate, the Neuro-Distributed Cognitive Adaptive Optimization (ND-CAO) methodology, a novel gradient-free algorithm that enables the efficient distributed training of arbitrary types of neural networks, in both synchronous and asynchronous manner. Contrary to the majority of existing methodologies, ND-CAO is applicable to any possible splitting of a potential neural network, into blocks (partitions), with each of the blocks allowed to update its parameters fully asynchronously and independently of the rest of the blocks. Most importantly, no data exchange is required between the different blocks during training with the only information each block requires is the global performance of the model. Convergence of ND-CAO is mathematically established for generic neural network architectures, independently of the particular choices made, while four comprehensive experimental cases, considering different model architectures and image classification tasks, validate the algorithms' robustness and effectiveness in both synchronous and asynchronous training modes. Moreover, by conducting a thorough comparison between synchronous and asynchronous ND-CAO training, the algorithm is identified as an efficient scheme to train neural networks in a novel gradient-independent, distributed, and asynchronous manner, delivering similar – or even improved results in Loss and Accuracy measures.

Aware: Adaptive Distributed Training with Computation, Communication and Position Awareness for Deep Learning Model.

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

Model-Aware Parallelization Strategy for Deep Neural Networks' Distributed Training

Coded Parallelism for Distributed Deep Learning.

Trinity: Neural Network Adaptive Distributed Parallel Training Method Based on Reinforcement Learning.

Adaptive Distributed Parallel Training Method for a Deep Learning Model Based on Dynamic Critical Paths of DAG

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

MP-DPS: Adaptive Distributed Training for Deep Learning Based on Node Merging and Path Prediction

Interlocking Backpropagation: Improving depthwise model-parallelism

Neuro-distributed cognitive adaptive optimization for training neural networks in a parallel and asynchronous manner

Training Acceleration for Deep Neural Networks: A Hybrid Parallelization Strategy

A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning

A parallel computing platform for training large scale neural networks

A Stage-Level Network Parallelization Method Based on Depth Decomposition

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

An Efficient 2D Method for Training Super-Large Deep Learning Models

Distributed SLIDE: Enabling Training Large Neural Networks on Low Bandwidth and Simple CPU-Clusters via Model Parallelism and Sparsity

DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING

Dynamic Universal Approximation Theory: Foundations for Parallelism in Neural Networks

Joint Dynamic Data and Model Parallelism for Distributed Training of DNNs over Heterogeneous Infrastructure

HierTrain: Fast Hierarchical Edge AI Learning with Hybrid Parallelism in Mobile-Edge-Cloud Computing