Symbolic Regression on FPGAs for Fast Machine Learning Inference

Ho Fung Tsoi,Adrian Alan Pol,Vladimir Loncar,Ekaterina Govorkova,Miles Cranmer,Sridhara Dasu,Peter Elmer,Philip Harris,Isobel Ojalvo,Maurizio Pierini
DOI: https://doi.org/10.1051/epjconf/202429509036
2024-01-18
Abstract:The high-energy physics community is investigating the potential of deploying machine-learning-based solutions on Field-Programmable Gate Arrays (FPGAs) to enhance physics sensitivity while still meeting data processing time constraints. In this contribution, we introduce a novel end-to-end procedure that utilizes a machine learning technique called symbolic regression (SR). It searches the equation space to discover algebraic relations approximating a dataset. We use PySR (a software to uncover these expressions based on an evolutionary algorithm) and extend the functionality of hls4ml (a package for machine learning inference in FPGAs) to support PySR-generated expressions for resource-constrained production environments. Deep learning models often optimize the top metric by pinning the network size because the vast hyperparameter space prevents an extensive search for neural architecture. Conversely, SR selects a set of models on the Pareto front, which allows for optimizing the performance-resource trade-off directly. By embedding symbolic forms, our implementation can dramatically reduce the computational resources needed to perform critical tasks. We validate our method on a physics benchmark: the multiclass classification of jets produced in simulated proton-proton collisions at the CERN Large Hadron Collider. We show that our approach can approximate a 3-layer neural network using an inference model that achieves up to a 13-fold decrease in execution time, down to 5 ns, while still preserving more than 90% approximation accuracy.
Machine Learning,High Energy Physics - Experiment,Instrumentation and Detectors
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve physical sensitivity while meeting the data processing time constraints in high - energy physics experiments. Specifically, the paper proposes a new method based on Symbolic Regression (SR) to achieve fast machine - learning inference on FPGA (Field - Programmable Gate Arrays). This method aims to reduce the consumption of computing resources, so as to be able to process the vast amount of data generated by the Large Hadron Collider (LHC) more efficiently. ### Problem Background 1. **Requirements of High - Energy Physics Experiments**: - High - energy physics experiments (such as LHC) need to process a large amount of real - time data, for example, tens of terabytes of data per second. - The time delay of data processing must be very short (O(1) microseconds) to ensure real - time classification and filtering of data. 2. **Limitations of Existing Methods**: - Although deep - learning models are powerful in performance, their deployment on FPGA faces problems such as large resource consumption and long inference time. - Neural networks usually optimize performance by fixing the network size, but due to the large hyper - parameter space, it is difficult to conduct extensive neural architecture search. ### Solution The paper introduces the Symbolic Regression (SR) technique, and its main advantages include: 1. **High Interpretability**: - SR generates mathematical expressions that can be directly interpreted, which helps to understand the underlying patterns and relationships in the data. 2. **Resource Optimization**: - SR can select models on the Pareto frontier and directly optimize the trade - off between performance and resources. - By embedding symbolic forms, SR can significantly reduce the computing resources required for key tasks. 3. **Low - Latency Inference**: - In some cases, the inference time of the SR model can be shortened to 1 clock cycle (5 nanoseconds), which is 13 times faster than traditional deep - learning models while maintaining more than 90% accuracy. ### Experimental Verification The paper uses a physics benchmark test: multi - class classification of jets produced by proton - proton collisions. The results show that the SR method can approximate a three - layer neural network, reducing the inference time by 13 times while maintaining high classification accuracy. ### Summary The main contribution of this paper is to provide a new end - to - end process, which uses symbolic regression to achieve fast machine - learning inference on FPGA, greatly reducing the computing resource requirements and providing a more efficient solution for high - energy physics experiments. This not only improves the speed and efficiency of data processing, but also provides new ideas for applications in other fields (such as biochemistry and medicine). --- If you have more specific questions or need further explanation, please feel free to let us know!