Parallel and distributed architecture of genetic algorithm on Apache Hadoop and Spark

Hao-Chun Lu,F.J. Hwang,Yao-Huei Huang
DOI: https://doi.org/10.1016/j.asoc.2020.106497
IF: 8.7
2020-10-01
Applied Soft Computing
Abstract:<p>The genetic algorithm (GA), one of the best-known metaheuristic algorithms, has been extensively utilized in various fields of management science, operational research, and industrial engineering. The efficiency of GAs in solving large-scale optimization problems would be enhanced if the iterative processes required by the genetic operators can be implemented in a parallel and distributed computing architecture. Apache Hadoop has recently been one of the most popular systems for distributed storage and parallel processing of big data. By integrating the GA highly into Apache Hadoop, this study proposes an advanced GA parallel and distributed computing architecture that achieves the effectiveness and efficiency of GA evolution. Characterized by the sophisticated mechanism of dispatching the GA core operators into Apache Hadoop, the developed computing framework fits well with the cloud computing model. The presented GA parallelization architecture outperforms the state-of-the-art reference architectures according to the computational experiments where the testing instances of traveling salesman problems are employed. Our numerical experiments also demonstrate that the proposed architecture can readily be extended to Apache Spark.</p>
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the efficiency improvement of Genetic Algorithms (GA) in large-scale optimization problems, particularly by enhancing the performance of Genetic Algorithms through the adoption of parallel and distributed computing architectures. Specifically, the goals of the paper are: 1. **Developing Parallel and Distributed Genetic Algorithm Architectures**: The researchers propose a parallel and distributed computing architecture for Genetic Algorithms based on Apache Hadoop, aiming to improve the efficiency of Genetic Algorithms in handling large-scale optimization problems. 2. **Integrating Genetic Algorithms with Apache Hadoop**: By efficiently incorporating the core operations of Genetic Algorithms into the Apache Hadoop framework, the study designs an advanced parallel and distributed computing architecture for Genetic Algorithms to achieve effective and efficient evolution. 3. **Addressing Issues in Existing Solutions**: The paper analyzes three existing benchmark models (master-slave model, distributed model, cellular model) and their implementations on MapReduce, pointing out their shortcomings such as premature convergence and low computational efficiency. 4. **Proposing a New Parallel Mechanism**: To overcome the drawbacks of existing models, the research proposes a new parallel mechanism that allows the evaluation, crossover, and mutation operations in Genetic Algorithms to be executed in a parallel and distributed environment, avoiding issues like premature convergence. 5. **Validating the Effectiveness of the Proposed Architecture**: Through computational experiments on the Traveling Salesman Problem instance, the proposed parallel Genetic Algorithm architecture is demonstrated to outperform existing reference architectures, and it can be easily extended to Apache Spark. In summary, the paper aims to provide a more efficient solution for large-scale optimization problems by designing a new parallel and distributed computing architecture for Genetic Algorithms, addressing the inefficiencies of traditional Genetic Algorithms in handling large-scale optimization problems.