Proportion-based normalizations outperform compositional data transformations in machine learning applications

Aaron Yerke,Daisy Fry Brumit,Anthony A. Fodor,Daisy Fry Brumit and Anthony A. Fodor
DOI: https://doi.org/10.1186/s40168-023-01747-z
IF: 15.5
2024-03-05
Microbiome
Abstract:Normalization, as a pre-processing step, can significantly affect the resolution of machine learning analysis for microbiome studies. There are countless options for normalization scheme selection. In this study, we examined compositionally aware algorithms including the additive log ratio (alr), the centered log ratio (clr), and a recent evolution of the isometric log ratio (ilr) in the form of balance trees made with the PhILR R package. We also looked at compositionally naïve transformations such as raw counts tables and several transformations that are based on relative abundance, such as proportions, the Hellinger transformation, and a transformation based on the logarithm of proportions (which we call "lognorm").
microbiology
What problem does this paper attempt to address?