A Comparison of Recent Algorithms for Symbolic Regression to Genetic Programming

Yousef A. Radwan,Gabriel Kronberger,Stephan Winkler
2024-06-06
Abstract:Symbolic regression is a machine learning method with the goal to produce interpretable results. Unlike other machine learning methods such as, e.g. random forests or neural networks, which are opaque, symbolic regression aims to model and map data in a way that can be understood by scientists. Recent advancements, have attempted to bridge the gap between these two fields; new methodologies attempt to fuse the mapping power of neural networks and deep learning techniques with the explanatory power of symbolic regression. In this paper, we examine these new emerging systems and test the performance of an end-to-end transformer model for symbolic regression versus the reigning traditional methods based on genetic programming that have spearheaded symbolic regression throughout the years. We compare these systems on novel datasets to avoid bias to older methods who were improved on well-known benchmark datasets. Our results show that traditional GP methods as implemented e.g., by Operon still remain superior to two recently published symbolic regression methods.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper primarily explores the latest advancements in the field of Symbolic Regression (SR) and compares new deep learning-based methods with traditional genetic programming methods. Specifically: 1. **Understanding New Methods**: Researchers aim to better understand recently developed symbolic regression methods, particularly those based on deep learning, which are designed to generate interpretable models. 2. **Performance Comparison**: The paper experimentally compares the performance of the latest end-to-end Transformer models on symbolic regression tasks with traditional genetic programming-based methods. Novel datasets are used in the experiments to avoid biases towards existing benchmark datasets. 3. **Method Evaluation**: The authors find that although many new methods have shown impressive results in papers, there is a lack of systematic comparison with established algorithms. Therefore, they selected some datasets from practical engineering to evaluate whether the algorithms are overfitting to specific benchmark test sets. In summary, the main goal of this paper is to evaluate and compare the latest symbolic regression methods, especially Transformer-based methods, with traditional genetic programming methods across various real-world datasets. By doing so, the paper aims to provide a more comprehensive understanding of how these new methods perform in practical applications and whether they truly outperform traditional symbolic regression methods.