Distributed Deep Learning for Question Answering

Minwei Feng,Bing Xiang,Bowen Zhou
DOI: https://doi.org/10.1145/2983323.2983377
2016-08-05
Abstract:This paper is an empirical study of the distributed deep learning for question answering subtasks: answer selection and question classification. Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP, DOWNPOUR and EASGD/EAMSGD algorithms have been presented. Experimental results show that the distributed framework based on the message passing interface can accelerate the convergence speed at a sublinear scale. This paper demonstrates the importance of distributed training. For example, with 48 workers, a 24x speedup is achievable for the answer selection task and running time is decreased from 138.2 hours to 5.81 hours, which will increase the productivity significantly.
Machine Learning,Computation and Language,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to significantly improve training efficiency and productivity by accelerating sub - tasks in the Question Answering (QA) system, including answer selection and question classification, through distributed deep learning. Specifically: 1. **High computational cost**: The training process of deep - learning models is very time - consuming, usually taking several days or even weeks. This is not ideal for practical applications (such as training - as - a - service in cloud services) and scientific research environments, because long computing time will limit the number of experiments and slow down the R & D cycle. 2. **The importance of distributed training**: In order to meet the above challenges, distributed training has become a key research direction. However, the performance of different optimization algorithms in a distributed environment has not been fully compared and verified. 3. **Specific challenges of QA tasks**: - **Answer selection**: Select the best answer from a given question and a pool of candidate answers. - **Question classification**: Classify questions into a predefined set of answers, which is suitable for scenarios such as online customer service. ### Main contributions of the paper - **Empirical research**: For the first time, a detailed comparison of the performance of multiple distributed training algorithms (such as SGD, MSGD, RMSProp, AdaDelta, AdaGrad, Adam/Adamax, Downpour, EASGD/EAMSGD) on QA sub - tasks has been carried out. - **Acceleration effect**: It has been shown that a distributed framework based on the Message Passing Interface (MPI) can significantly accelerate the convergence speed. For example, when using 48 workers for the answer selection task, a 24 - fold acceleration can be achieved, reducing the running time from 138.2 hours to 5.81 hours. - **The importance of algorithm selection**: It has been proved that choosing an appropriate distributed training algorithm is crucial for significantly improving the training speed while maintaining model accuracy. ### Conclusion The paper proves the importance of distributed training in QA tasks through experiments, and points out that DOWNPOUR, EAMSGD and RMSProp are the most excellent distributed training methods, which can significantly accelerate the convergence speed while maintaining accuracy. This provides valuable references and guidance for future distributed deep - learning research.