Abstract:Contrary to genetic programming, the neural network approach to symbolic regression can efficiently handle high-dimensional inputs and leverage gradient methods for faster equation searching. Common ways of constraining expression complexity often involve multistage pruning with fine-tuning, which can result in significant performance loss. In this work, we propose $\tt{SymbolNet}$, a neural network approach to symbolic regression in a novel framework that allows dynamic pruning of model weights, input features, and mathematical operators in a single training process, where both training loss and expression complexity are optimized simultaneously. We introduce a sparsity regularization term for each pruning type, which can adaptively adjust its strength, leading to convergence at a target sparsity ratio. Unlike most existing symbolic regression methods that struggle with datasets containing more than $\mathcal{O}(10)$ inputs, we demonstrate the effectiveness of our model on the LHC jet tagging task (16 inputs), MNIST (784 inputs), and SVHN (3072 inputs). Our approach enables symbolic regression to achieve fast inference with nanosecond-scale latency on FPGAs for high-dimensional datasets in environments with stringent computational resource constraints, such as the high-energy physics experiments at the LHC.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced when performing symbolic regression (SR) on high - dimensional data sets. Specifically: 1. **Limitations of traditional methods**: Traditional symbolic regression methods, such as Genetic Programming (GP), although perform well on low - dimensional data sets, are inefficient and time - consuming when dealing with high - dimensional data sets. This restricts their application in practical problems, especially in computationally - resource - limited environments, such as high - energy physics research in the Large Hadron Collider (LHC) experiments. 2. **Deficiencies of neural network methods**: Although neural - network - based methods can efficiently handle high - dimensional inputs through gradient optimization, existing methods usually rely on multi - stage pruning strategies to control the complexity of expressions. This multi - stage framework not only requires multiple training and fine - tuning, but may also lead to performance degradation because accuracy and sparsity are optimized separately in different training stages. 3. **The need for dynamic pruning**: To overcome the above problems, this paper proposes a new framework - SymbolNet, which dynamically prunes model weights, input features, and mathematical operators during a single training process while optimizing the training loss and expression complexity. This method can adaptively adjust the strength of the regularization term, enabling the model to converge to the target sparsity ratio. ### Main contributions - **End - to - end single - stage dynamic pruning**: SymbolNet only requires one training stage without additional fine - tuning. Each model weight, input feature, and mathematical operator is assigned a trainable threshold, and these thresholds dynamically compete during the training process, thereby achieving precise pruning. - **Convergence to the target sparsity ratio**: For each pruning type (model weight, input feature, unary operator, binary operator), an adaptively - adjusted regularization term is introduced, enabling the model to converge to the user - specified target sparsity ratio. - **Scalability for high - dimensional data sets**: Through dynamic pruning and gradient - based optimization, SymbolNet can generate optimal and compact expressions and effectively fit large - scale and complex high - dimensional data sets. ### Experimental verification The authors verified the effectiveness of SymbolNet on multiple high - dimensional data sets, including the LHC jet - tagging task (with 16 inputs), the MNIST data set (with 784 inputs), and the SVHN data set (with 3072 inputs). The experimental results show that SymbolNet can not only generate compact and competitive expressions, but also achieve nanosecond - level low - latency inference on FPGA, which is suitable for computationally - resource - limited environments. ### Conclusion SymbolNet solves the limitations of traditional symbolic regression methods on high - dimensional data sets by introducing a dynamic pruning mechanism, providing a new solution for practical applications in fields such as high - energy physics.

SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning

PruneSymNet: A Symbolic Neural Network and Pruning Algorithm for Symbolic Regression

A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data

Exploring Hidden Semantics in Neural Networks with Symbolic Regression

Class-Aware Pruning for Efficient Neural Networks

Controllable Neural Symbolic Regression

MetaSymNet: A Tree-like Symbol Network with Adaptive Architecture and Activation Functions

A Novel Neural Network-Based Symbolic Regression Method: Neuro-Encoded Expression Programming

Symbolic Regression on FPGAs for Fast Machine Learning Inference

Operator Feature Neural Network for Symbolic Regression

Integration of Neural Network-Based Symbolic Regression in Deep Learning for Scientific Discovery

Accelerating Understanding of Scientific Experiments with End to End Symbolic Regression

Constraining Genetic Symbolic Regression via Semantic Backpropagation

Toward Physically Plausible Data-Driven Models: A Novel Neural Network Approach to Symbolic Regression

Scalable Neural Symbolic Regression using Control Variables

Neuro-Symbolic AI: An Emerging Class of AI Workloads and their Characterization

End-to-end symbolic regression with transformers

Deep Generative Symbolic Regression

Softened Symbol Grounding for Neuro-symbolic Systems

Semantic Strengthening of Neuro-Symbolic Learning

Neural Symbolic Regression of Complex Network Dynamics