Discovering Mathematical Formulas from Data via GPT-guided Monte Carlo Tree Search

Yanjie Li,Weijun Li,Lina Yu,Min Wu,Jingyi Liu,Wenqiang Li,Meilan Hao,Shu Wei,Yusong Deng

2024-01-30

Abstract:Finding a concise and interpretable mathematical formula that accurately describes the relationship between each variable and the predicted value in the data is a crucial task in scientific research, as well as a significant challenge in artificial intelligence. This problem is referred to as symbolic regression, which is an NP-hard problem. In the previous year, a novel symbolic regression methodology utilizing Monte Carlo Tree Search (MCTS) was advanced, achieving state-of-the-art results on a diverse range of datasets. although this algorithm has shown considerable improvement in recovering target expressions compared to previous methods, the lack of guidance during the MCTS process severely hampers its search efficiency. Recently, some algorithms have added a pre-trained policy network to guide the search of MCTS, but the pre-trained policy network generalizes poorly. To optimize the trade-off between efficiency and versatility, we introduce SR-GPT, a novel algorithm for symbolic regression that integrates Monte Carlo Tree Search (MCTS) with a Generative Pre-Trained Transformer (GPT). By using GPT to guide the MCTS, the search efficiency of MCTS is significantly improved. Next, we utilize the MCTS results to further refine the GPT, enhancing its capabilities and providing more accurate guidance for the MCTS. MCTS and GPT are coupled together and optimize each other until the target expression is successfully determined. We conducted extensive evaluations of SR-GPT using 222 expressions sourced from over 10 different symbolic regression datasets. The experimental results demonstrate that SR-GPT outperforms existing state-of-the-art algorithms in accurately recovering symbolic expressions both with and without added noise.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

This paper aims to address the problem of symbolic regression, which involves finding concise and interpretable mathematical formulas from data to describe the relationship between variables and predicted values. Symbolic regression is an NP-hard problem with significant scientific value, but traditional methods such as genetic programming are inefficient and sensitive to hyperparameters. The paper introduces a new algorithm called SR-GPT, which combines Monte Carlo Tree Search (MCTS) and pretrained generative transformers (GPT). SR-GPT uses GPT to guide the MCTS search process, improving search efficiency, and utilizes the results from MCTS to further optimize GPT, forming a synergistic optimization between the two. Compared to previous methods like SPL, DGSR-MCTS, and TPSR, SR-GPT demonstrates better accuracy in recovering symbolic expressions on various noisy datasets. The main contributions of SR-GPT include: 1. Introducing GPT to enhance the search efficiency of MCTS. 2. Improving the loss function to encourage GPT to generate probability distributions with lower information entropy, avoiding cases where all symbol predictions have similar probabilities. 3. Proposing a new loss function, SNRMSE, to address the problem of variable omission in multivariate regression. The paper also compares the performance of SR-GPT with other baseline algorithms on multiple benchmark datasets, and the results show that SR-GPT outperforms these algorithms in terms of success rate in fully recovering expressions. Additionally, the paper discusses the design of constraint optimization, search space limitation, termination criteria, and reward function to ensure the effectiveness and stability of the algorithm.

Discovering Mathematical Formulas from Data via GPT-guided Monte Carlo Tree Search

Discovering symbolic expressions with parallelized tree search

Generative Pre-Trained Transformer for Symbolic Regression Base In-Context Reinforcement Learning

SymbolicGPT: A Generative Transformer Model for Symbolic Regression

GPT-Guided Monte Carlo Tree Search for Symbolic Regression in Financial Fraud Detection

Incorporating Actor-Critic in Monte Carlo tree search for symbolic regression

Differentiable Genetic Programming for High-dimensional Symbolic Regression

End-to-end symbolic regression with transformers

An Efficient and Generalizable Symbolic Regression Method for Time Series Analysis

A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data

Deep Generative Symbolic Regression

Bayesian Symbolic Regression

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Transformer-based Planning for Symbolic Regression

A Greedy Search Tree Heuristic for Symbolic Regression

MMSR: Symbolic Regression is a Multi-Modal Information Fusion Task

A Functional Analysis Approach to Symbolic Regression

Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients

GPTree: Towards Explainable Decision-Making via LLM-powered Decision Trees

Symbolic Expression Transformer: A Computer Vision Approach for Symbolic Regression

In Context Learning and Reasoning for Symbolic Regression with Large Language Models