Generalized massive optimal data compression

Justin Alsing,Benjamin Wandelt
DOI: https://doi.org/10.1093/mnrasl/sly029
2018-04-03
Abstract:Data compression has become one of the cornerstones of modern astronomical data analysis, with the vast majority of analyses compressing large raw datasets down to a manageable number of informative summaries. In this paper we provide a general procedure for optimally compressing $N$ data down to $n$ summary statistics, where $n$ is equal to the number of parameters of interest. We show that compression to the score function -- the gradient of the log-likelihood with respect to the parameters -- yields $n$ compressed statistics that are optimal in the sense that they preserve the Fisher information content of the data. Our method generalizes earlier work on linear Karhunen-Loéve compression for Gaussian data whilst recovering both lossless linear compression and quadratic estimation as special cases when they are optimal. We give a unified treatment that also includes the general non-Gaussian case as long as mild regularity conditions are satisfied, producing optimal non-linear summary statistics when appropriate. As a worked example, we derive explicitly the $n$ optimal compressed statistics for Gaussian data in the general case where both the mean and covariance depend on the parameters.
Cosmology and Nongalactic Astrophysics
What problem does this paper attempt to address?