Abstract:Machine-Learned Likelihoods (MLL) combines machine-learning classification techniques with likelihood-based inference tests to estimate the experimental sensitivity of high-dimensional data sets. We extend the MLL method by including Kernel Density Estimators (KDE) to avoid binning the classifier output to extract the resulting one-dimensional signal and background probability density functions. We first test our method on toy models generated with multivariate Gaussian distributions, where the true probability distribution functions are known. Later, we apply the method to two cases of interest at the LHC: a search for exotic Higgs bosons, and a $Z'$ boson decaying into lepton pairs. In contrast to physical-based quantities, the typical fluctuations of the ML outputs give non-smooth probability distributions for pure-signal and pure-background samples. The non-smoothness is propagated into the density estimation due to the good performance and flexibility of the KDE method. We study its impact on the final significance computation, and we compare the results using the average of several independent ML output realizations, which allows us to obtain smoother distributions. We conclude that the significance estimation turns out to be not sensible to this issue.

What problem does this paper attempt to address?

The paper is primarily dedicated to addressing the problem of distinguishing between signal and background in high-energy physics experiments and proposes an improved method to estimate experimental sensitivity. Specifically, the paper addresses the following key issues: 1. **Improved Machine Learning Likelihood Method (MLL)**: The paper extends the machine learning likelihood (MLL) method by introducing kernel density estimators (KDE), avoiding the need to bin the classifier output to extract the probability density functions (PDF) of signal and background. This helps retain more information and reduces information loss. 2. **Handling Non-Smooth Probability Distributions**: Due to the randomness in the machine learning training process, even if the classifier approaches optimal performance, its output can produce non-smooth probability distributions. The paper explores how to obtain smoother probability distributions by averaging the results of multiple independent machine learning implementations. 3. **Application Case Analysis**: The paper validates the effectiveness of the proposed method through several application cases, including: - A toy model generated using multivariate Gaussian distributions, where the true probability distribution functions are known. - An example of searching for exotic Higgs bosons at the Large Hadron Collider (LHC). - A study of the $Z'$ boson decay into lepton pairs in the context of the Superstring Standard Model (SSM). 4. **Comparison of Different Methods**: The paper also compares the proposed MLL+KDE method with other traditional methods (such as the binned likelihood method) in different scenarios, particularly highlighting its advantages in handling high-dimensional data. Through this work, the paper aims to enhance the ability to search for new physics signals in high-energy physics experiments, especially in situations where it is necessary to distinguish weak signals from complex backgrounds.

Machine-Learned Exclusion Limits without Binning

Machine Learning Electroweakino Production

Resonance Searches with Machine Learned Likelihood Ratios

Efficiently Exploring Multi-Dimensional Parameter Spaces Beyond the Standard Model

Machine Learning for Prediction of Unitarity and Bounded from Below Constraints

Machine Learning Techniques for Intermediate Mass Gap Lepton Partner Searches at the Large Hadron Collider

Binary Discrimination Through Next-to-Leading Order

Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond

Beyond Cuts in Small Signal Scenarios -- Enhanced Sneutrino Detectability Using Machine Learning

Machine-Learning Performance on Higgs-Pair Production associated with Dark Matter at the LHC

Anomaly-aware summary statistic from data batches

Improvement and generalization of ABCD method with Bayesian inference

Top squark signal significance enhancement by different Machine Learning Algorithms

LHC Study of Third-Generation Scalar Leptoquarks with Machine-Learned Likelihoods

Mean-Field Langevin Dynamics for Signed Measures via a Bilevel Approach

Novel machine learning applications at the LHC

Machine Learning for Columnar High Energy Physics Analysis

Multiple testing for signal-agnostic searches of new physics with machine learning