The Babe Ruth Algorithm: a fast, unbiased procedure to randomize presence-absence data matrices with fixed row and column totals

Giovanni Strona,Domenico Nappo,Francesco Boccacci,Simone Fattorini,Jesus San-Miguel-Ayanz
DOI: https://doi.org/10.1038/ncomms5114
2014-04-14
Abstract:A well-known problem in numerical ecology is how to recombine presence-absence matrices without altering row and column totals. A few solutions have been proposed, but all of them present some issues in terms of statistical robustness (i.e. their capability to generate different matrix configurations with the same probability) and their performance (i.e. the computational effort they require to generate a null matrix). Here we introduce the 'Babe Ruth Algorithm', a new procedure that differs from existing ones in that it focuses rather on matrix information content than on matrix structure. We demonstrate that the algorithm can sample uniformly the set of all possible matrix configurations requiring a computational effort orders of magnitude lower than that required by available methods, making it possible to easily randomize matrices larger than 10^8 cells.
Statistics Theory
What problem does this paper attempt to address?