Hypothesis tests and model parameter estimation on data sets with missing correlation information

Lukas Koch
2024-10-30
Abstract:Ideally, all analyses of normally distributed data should include the full covariance information between all data points. In practice, the full covariance matrix between all data points is not always available. Either because a result was published without a covariance matrix, or because one tries to combine multiple results from separate publications. For simple hypothesis tests, it is possible to define robust test statistics that will behave conservatively in the presence on unknown correlations. For model parameter fits, one can inflate the variance by factor to ensure that things remain conservative at least up to a chosen confidence level. This paper describes a class of robust test statistics for simply hypothesis tests, as well as an algorithm to determine the necessary inflation factor model parameter fits. It then presents some example applications of the methods to real neutrino interaction data and model comparisons.
Methodology,High Energy Physics - Phenomenology,Applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to conduct hypothesis testing and model parameter estimation when the data set lacks complete covariance information. Specifically, when the correlation information between data points is incomplete or unknown, traditional statistical methods may lead to wrong conclusions. Therefore, the author proposes a class of robust test statistics and an algorithm for determining the inflation factor to ensure the conservatism of the results at a given confidence level. ### Specific description of the problem 1. **Hypothesis testing**: - In practical applications, the complete covariance matrix is not always available because the covariance matrix is not included when the data is released or the results from different sources need to be combined. - For simple hypothesis testing, this situation can be dealt with by choosing alternative test statistics that are robust to unknown correlations. 2. **Model parameter estimation**: - For model parameter fitting, an inflation factor can be introduced to ensure that the results remain conservative at least at a given confidence level. - Traditional methods such as the least - squares method may fail in the presence of unknown correlations because they rely on the known covariance matrix. ### Solutions 1. **Robust test statistics**: - A class of robust test statistics suitable for simple hypothesis testing is proposed. These statistics can maintain conservatism in the case of unknown correlations. - Specifically, for the case where the block covariance is known but the inter - block correlations are unknown, the test statistics are defined by minimizing the Mahalanobis distance. 2. **Inflation factor algorithm**: - For model parameter estimation, an algorithm for determining the inflation factor is proposed to ensure that the results remain conservative at a given confidence level. - This method is similar to existing methods, such as variance doubling and the flat extrapolation factor (S - factor), but adjusts the uncertainty based on the potential correlations in the worst - case scenario. ### Application examples The paper shows the application of these methods in real neutrino interaction data and model comparison, verifying their effectiveness and robustness. ### Mathematical formulas - Mahalanobis distance: \[ D^2_i=(\mathbf{x}_i - \boldsymbol{\mu}_i)^T\mathbf{S}_{ii}^{-1}(\mathbf{x}_i - \boldsymbol{\mu}_i) \] - Robust test statistics: \[ \text{fitted}(\mathbf{x}|\boldsymbol{\mu},\mathbf{S})=\max_i D^2_i \] - Inflation factor: \[ S_{\theta_0}=(A^T\mathbf{S}_0^{-1}A)^{-1} \] Through these methods, the paper provides an effective way to conduct reliable statistical analysis in the absence of complete covariance information.