Abstract:In this work we demonstrate that significant gains in performance and data efficiency can be achieved in High Energy Physics (HEP) by moving beyond the standard paradigm of sequential optimization or reconstruction and analysis components. We conceptually connect HEP reconstruction and analysis to modern machine learning workflows such as pretraining, finetuning, domain adaptation and high-dimensional embedding spaces and quantify the gains in the example usecase of searches of heavy resonances decaying via an intermediate di-Higgs system to four $b$-jets.

What problem does this paper attempt to address?

This paper investigates an issue in high-energy physics (HEP) data analysis, which is that traditional step-by-step optimization methods may not be optimal for data analysis pipelines. The study found that significant improvements in performance and data efficiency can be achieved by leveraging modern large-scale machine learning (ML) workflows such as pre-training, fine-tuning, domain adaptation, and high-dimensional embedding space. Specifically, the paper proposes combining HEP reconstruction and analysis with concepts like pre-training and fine-tuning, and quantifies these gains in an example case study of searching for resonant decays to a di-Higgs system. Traditionally, HEP data analysis adopts a hierarchical pattern recognition and inference approach, where low-level patterns are first identified and then progressively reconstructed and analyzed. However, the paper points out that this step-by-step optimization strategy may fail to obtain the global optimum. In the study, the authors demonstrate that performance can be improved and sample size can be reduced to enhance data efficiency through global gradient-based optimization strategies. The main contributions of the paper include: 1. Establishing correspondences between HEP analysis workflows and modern deep learning concepts, such as base models, downstream tasks, and fine-tuning. 2. Demonstrating end-to-end optimization in the setting of particle physics, including fine-tuning of object representations and event-level inference. 3. Quantifying significant improvements in data efficiency and performance with fixed sample size through end-to-end optimization. 4. Providing evidence of successful domain adaptation when fine-tuning HEP base models on non-pretrained datasets. Related works include studies on optimizing HEP analysis and handling low-level variables using deep learning. The paper also highlights the similarities between HEP analysis and machine learning approaches based on pretrained base models and proposes a generic strategy for optimizing HEP data analysis pipelines. Experimental results show that fine-tuning strategies outperform traditional HEP methods in terms of performance and data efficiency.

Finetuning Foundation Models for Joint Analysis Optimization

Beyond Cuts in Small Signal Scenarios -- Enhanced Sneutrino Detectability Using Machine Learning

Exploring parameter spaces with artificial intelligence and machine learning black-box optimization algorithms

Enhancing the hunt for new phenomena in dijet final-states using anomaly detection filters at the High-Luminosity Large Hadron Collider

Enhancing the hunt for new phenomena in dijet final states using anomaly detection filters at the high-luminosity large Hadron Collider

Simple, but not simplified: A new approach for optimising beyond-Standard Model physics searches at the Large Hadron Collider

Progress in End-to-End Optimization of Detectors for Fundamental Physics with Differentiable Programming

Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality

OmniJet-$α$: The first cross-task foundation model for particle physics

Optimizing The Cut And Count Method In Phenomenological Studies

Hyperparameter Optimisation in Deep Learning from Ensemble Methods: Applications to Proton Structure

Optimizing Geant4 Hadronic Models

Accelerating Resonance Searches via Signature-Oriented Pre-training

BROOD: Bilevel and Robust Optimization and Outlier Detection for Efficient Tuning of High-Energy Physics Event Generators

Learning Dynamics of LLM Finetuning

Refining fast simulation using machine learning

Model Balancing Helps Low-data Training and Fine-tuning

Applying Machine Learning Techniques To Intermediate-Length Cascade Decays

Low-resource finetuning of foundation models beats state-of-the-art in histopathology

Improving Neutrino Energy Reconstruction with Machine Learning

Searches for new physics with boosted top quarks in the MadAnalysis 5 and Rivet frameworks