TensorGP -- Genetic Programming Engine in TensorFlow

Francisco Baeta,João Correia,Tiago Martins,Penousal Machado
DOI: https://doi.org/10.48550/arXiv.2103.07512
2021-03-13
Abstract:In this paper, we resort to the TensorFlow framework to investigate the benefits of applying data vectorization and fitness caching methods to domain evaluation in Genetic Programming. For this purpose, an independent engine was developed, TensorGP, along with a testing suite to extract comparative timing results across different architectures and amongst both iterative and vectorized approaches. Our performance benchmarks demonstrate that by exploiting the TensorFlow eager execution model, performance gains of up to two orders of magnitude can be achieved on a parallel approach running on dedicated hardware when compared to a standard iterative approach.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to accelerate the fitness evaluation process in Genetic Programming (GP) by using data vectorization and fitness caching methods. Specifically, the authors developed an independent engine named TensorGP, aiming to study the benefits of applying these methods in the fitness evaluation stage of Genetic Programming and compare the performance results under different processor types. The main objective of the paper is to show that by taking advantage of the features of the TensorFlow framework, such as data vectorization and fitness caching, the execution efficiency of Genetic Programming on parallel hardware can be significantly improved. In particular, when running on dedicated hardware, the performance improvement can reach two orders of magnitude. ### Background of the Paper Genetic Programming is an evolutionary algorithm used to automatically evolve computer programs. Since it is necessary to execute and test each individual in the population, Genetic Programming usually requires a large amount of computing resources, and fitness evaluation is one of the most computationally expensive operations. Nevertheless, Genetic Programming is a powerful technique that can solve any problem that can be solved by a computer program without domain - specific knowledge. In addition, Genetic Programming has "embarrassing parallelism", which means that it can be easily accelerated on multi - core or parallel computing architectures. ### Solutions To accelerate fitness evaluation in Genetic Programming, the paper proposes two main techniques: 1. **Fitness Caching**: Save intermediate fitness results to avoid re - executing the same code segments. 2. **Data Vectorization**: Evaluate all fitness cases simultaneously through tensor operations, thereby reducing the computation time. ### Implementation The authors used the TensorFlow framework to implement these techniques. TensorFlow is a powerful numerical computing library that supports efficient vectorization operations on different hardware. They developed a Genetic Programming engine named TensorGP, which takes advantage of the following features of TensorFlow: - **Data Vectorization**: Represent the data for fitness evaluation as tensors, so that it can be efficiently executed on parallel hardware such as GPUs. - **Fitness Caching**: Avoid repeated calculations by caching intermediate results to accelerate the evolution process. - **Dynamic Graph Execution**: Utilize TensorFlow's dynamic graph execution mode (eager execution) to avoid the overhead of constructing static graphs, especially when individuals are constantly changing. ### Experiments and Results The paper verified the effectiveness of TensorGP through a series of experiments. The experiments include: - **Tree Evaluation Experiment**: Isolate the tensor evaluation stage, execute a batch of populations, and compare the execution times of different methods. - **Evolution Experiment**: Let individuals evolve for 50 generations under different problem scales and evaluate the total running time. The experimental results show that compared with traditional iterative methods, TensorGP has a significant performance improvement on parallel hardware (especially GPUs). For example, when dealing with problems with more than 4 million evaluation points, the performance of TensorGP is nearly 8 times faster than that of traditional iterative methods. ### Conclusion The paper shows that by using TensorFlow's vectorization and fitness caching techniques, significant performance improvements can be achieved in Genetic Programming. In particular, when running on dedicated hardware (such as GPUs), the performance improvement is especially obvious. This provides new possibilities for the application of Genetic Programming in large - scale problems.