Symbolic Integration Algorithm Selection with Machine Learning: LSTMs vs Tree LSTMs

Rashid Barket,Matthew England,Jürgen Gerhard

2024-04-23

Abstract:Computer Algebra Systems (e.g. Maple) are used in research, education, and industrial settings. One of their key functionalities is symbolic integration, where there are many sub-algorithms to choose from that can affect the form of the output integral, and the runtime. Choosing the right sub-algorithm for a given problem is challenging: we hypothesise that Machine Learning can guide this sub-algorithm choice. A key consideration of our methodology is how to represent the mathematics to the ML model: we hypothesise that a representation which encodes the tree structure of mathematical expressions would be well suited. We trained both an LSTM and a TreeLSTM model for sub-algorithm prediction and compared them to Maple's existing approach. Our TreeLSTM performs much better than the LSTM, highlighting the benefit of using an informed representation of mathematical expressions. It is able to produce better outputs than Maple's current state-of-the-art meta-algorithm, giving a strong basis for further research.

Machine Learning,Mathematical Software,Symbolic Computation

What problem does this paper attempt to address?

The problem discussed in this paper is how to use machine learning to optimize the selection of symbolic integration algorithms in computer algebra systems such as Maple. In Maple, the main symbolic integration algorithm "int" can choose from 12 different sub-algorithms, each of which may produce outputs of different forms and running times. Selecting the correct sub-algorithm is a challenge for a given problem. The researchers hypothesize that this selection process can be guided through machine learning. In the paper, the researchers use two types of recurrent neural networks (LSTM and TreeLSTM) to predict which sub-algorithm can generate the shortest output expression. They believe that the representation method that can encode the tree structure of mathematical expression is advantageous for machine learning models. The experimental results show that TreeLSTM outperforms LSTM in selecting the optimal sub-algorithm, and in some cases, it provides better results than Maple's current state-of-the-art meta-algorithm, laying a solid foundation for future research. In addition, the paper introduces methods for data generation, including improvements to existing methods to generate more diverse integrable mathematical expressions. The experimental section describes the details of model training, pre-processing, design, and evaluation, demonstrating the performance of TreeLSTM on the test set and Maple test suite, proving the advantages of tree structure representation and the potential of machine learning in this field.

Symbolic Integration Algorithm Selection with Machine Learning: LSTMs vs Tree LSTMs

Symbolic integration by integrating learning models with different strengths and weaknesses

Lessons on Datasets and Paradigms in Machine Learning for Symbolic Computation: A Case Study on CAD

Generating Elementary Integrable Expressions

Symbolic Physics Learner: Discovering governing equations via Monte Carlo tree search

AlphaIntegrator: Transformer Action Search for Symbolic Integration Proofs

Large Language Model-Enhanced Algorithm Selection: Towards Comprehensive Algorithm Representation

Effective LSTMs with Seasonal-Trend Decomposition and Adaptive Learning and Niching-Based Backtracking Search Algorithm for Time Series Forecasting

Transformers to Predict the Applicability of Symbolic Integration Routines

Investigating Symbolic Capabilities of Large Language Models

Discovering symbolic expressions with parallelized tree search

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Symbolic Equation Solving via Reinforcement Learning

Incorporating Actor-Critic in Monte Carlo tree search for symbolic regression

A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs

Discovering Mathematical Formulas from Data via GPT-guided Monte Carlo Tree Search

LLM-TS Integrator: Integrating LLM for Enhanced Time Series Modeling

Symbolic Segmentation Using Algorithm Selection

Can Large Language Models Understand Symbolic Graphics Programs?

$T^2$ of Thoughts: Temperature Tree Elicits Reasoning in Large Language Models

A tree-BLSTM-based recognition system for online handwritten mathematical expressions