Abstract:This paper is an empirical study of the distributed deep learning for question answering subtasks: answer selection and question classification. Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP, DOWNPOUR and EASGD/EAMSGD algorithms have been presented. Experimental results show that the distributed framework based on the message passing interface can accelerate the convergence speed at a sublinear scale. This paper demonstrates the importance of distributed training. For example, with 48 workers, a 24x speedup is achievable for the answer selection task and running time is decreased from 138.2 hours to 5.81 hours, which will increase the productivity significantly.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to significantly improve training efficiency and productivity by accelerating sub - tasks in the Question Answering (QA) system, including answer selection and question classification, through distributed deep learning. Specifically: 1. **High computational cost**: The training process of deep - learning models is very time - consuming, usually taking several days or even weeks. This is not ideal for practical applications (such as training - as - a - service in cloud services) and scientific research environments, because long computing time will limit the number of experiments and slow down the R & D cycle. 2. **The importance of distributed training**: In order to meet the above challenges, distributed training has become a key research direction. However, the performance of different optimization algorithms in a distributed environment has not been fully compared and verified. 3. **Specific challenges of QA tasks**: - **Answer selection**: Select the best answer from a given question and a pool of candidate answers. - **Question classification**: Classify questions into a predefined set of answers, which is suitable for scenarios such as online customer service. ### Main contributions of the paper - **Empirical research**: For the first time, a detailed comparison of the performance of multiple distributed training algorithms (such as SGD, MSGD, RMSProp, AdaDelta, AdaGrad, Adam/Adamax, Downpour, EASGD/EAMSGD) on QA sub - tasks has been carried out. - **Acceleration effect**: It has been shown that a distributed framework based on the Message Passing Interface (MPI) can significantly accelerate the convergence speed. For example, when using 48 workers for the answer selection task, a 24 - fold acceleration can be achieved, reducing the running time from 138.2 hours to 5.81 hours. - **The importance of algorithm selection**: It has been proved that choosing an appropriate distributed training algorithm is crucial for significantly improving the training speed while maintaining model accuracy. ### Conclusion The paper proves the importance of distributed training in QA tasks through experiments, and points out that DOWNPOUR, EAMSGD and RMSProp are the most excellent distributed training methods, which can significantly accelerate the convergence speed while maintaining accuracy. This provides valuable references and guidance for future distributed deep - learning research.

Distributed Deep Learning for Question Answering

Multi-agent Deep Reinforcement Learning Algorithm for Distributed Economic Dispatch in Smart Grid.

Target-Value-Competition-Based Multi-Agent Deep Reinforcement Learning Algorithm for Distributed Nonconvex Economic Dispatch

DaSGD: Squeezing SGD Parallelization Performance in Distributed Training Using Delayed Averaging

Adjacent Leader Decentralized Stochastic Gradient Descent

Distributed Active Learning.

CADA: Communication-Adaptive Distributed Adam

EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models

Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning.

A Survey From Distributed Machine Learning to Distributed Deep Learning

AutoDDL: Automatic Distributed Deep Learning With Near-Optimal Bandwidth Cost

Adaptive Worker Grouping For Communication-Efficient and Straggler-Tolerant Distributed SGD

EP4DDL: addressing straggler problem in heterogeneous distributed deep learning

Peering Beyond the Gradient Veil with Distributed Auto Differentiation

DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING

DBS: Dynamic Batch Size For Distributed Deep Neural Network Training

Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep Learning

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework

Detached Error Feedback for Distributed SGD with Random Sparsification

DMADRL: A Distributed Multi-agent Deep Reinforcement Learning Algorithm for Cognitive Offloading in Dynamic MEC Networks