Symbolic Integration Algorithm Selection with Machine Learning: LSTMs vs Tree LSTMs

Rashid Barket,Matthew England,Jürgen Gerhard
2024-04-23
Abstract:Computer Algebra Systems (e.g. Maple) are used in research, education, and industrial settings. One of their key functionalities is symbolic integration, where there are many sub-algorithms to choose from that can affect the form of the output integral, and the runtime. Choosing the right sub-algorithm for a given problem is challenging: we hypothesise that Machine Learning can guide this sub-algorithm choice. A key consideration of our methodology is how to represent the mathematics to the ML model: we hypothesise that a representation which encodes the tree structure of mathematical expressions would be well suited. We trained both an LSTM and a TreeLSTM model for sub-algorithm prediction and compared them to Maple's existing approach. Our TreeLSTM performs much better than the LSTM, highlighting the benefit of using an informed representation of mathematical expressions. It is able to produce better outputs than Maple's current state-of-the-art meta-algorithm, giving a strong basis for further research.
Machine Learning,Mathematical Software,Symbolic Computation
What problem does this paper attempt to address?
The problem discussed in this paper is how to use machine learning to optimize the selection of symbolic integration algorithms in computer algebra systems such as Maple. In Maple, the main symbolic integration algorithm "int" can choose from 12 different sub-algorithms, each of which may produce outputs of different forms and running times. Selecting the correct sub-algorithm is a challenge for a given problem. The researchers hypothesize that this selection process can be guided through machine learning. In the paper, the researchers use two types of recurrent neural networks (LSTM and TreeLSTM) to predict which sub-algorithm can generate the shortest output expression. They believe that the representation method that can encode the tree structure of mathematical expression is advantageous for machine learning models. The experimental results show that TreeLSTM outperforms LSTM in selecting the optimal sub-algorithm, and in some cases, it provides better results than Maple's current state-of-the-art meta-algorithm, laying a solid foundation for future research. In addition, the paper introduces methods for data generation, including improvements to existing methods to generate more diverse integrable mathematical expressions. The experimental section describes the details of model training, pre-processing, design, and evaluation, demonstrating the performance of TreeLSTM on the test set and Maple test suite, proving the advantages of tree structure representation and the potential of machine learning in this field.