Integrity and miss grouping as support for clusters in agglomerative hierarchical methods: the R-package octopucs

Ian MacGregor-Fors,Roger Guevara
DOI: https://doi.org/10.1101/2024.08.01.606070
2024-08-05
Abstract:The hierarchical clustering of communities based on species' compositional similarity (and abundance or frequency) is standard in community ecology to unveil large-scale patterns and underlay environmental causes of differentiation among communities. Often, the threshold to discretize clusters is arbitrary despite the existence of methods that minimize this bias. Most available techniques use the exact repeatability of clusters' memberships under resampling protocols to define robust groups. Here, we propose a novel method to yield cluster support throughout the topology of hierarchical analyses. We acknowledge that the observed dataset may be biased. Instead of using the observed topology as a reference to work out the groups' support, we compiled a consensus topology. Then, we borrowed the ecological concepts of reciprocal complementarities between a pair of communities and translated them into cluster integrity and contamination. This procedure allows for building support for groups even when there is a partial membership match after resampling the dataset. In addition, we present the R package octopucs in which we implemented the method reported here. Compared with other methods, the new proposal robustly detected changes in the group memberships, resulting in considerable differences in the pattern of supported clusters.
Bioinformatics
What problem does this paper attempt to address?