nRCFV: a new, dataset-size-independent metric to quantify compositional heterogeneity in nucleotide and amino acid datasets

James F. Fleming,Torsten H. Struck
DOI: https://doi.org/10.1186/s12859-023-05270-8
IF: 3.307
2023-04-14
BMC Bioinformatics
Abstract:Compositional heterogeneity—when the proportions of nucleotides and amino acids are not broadly similar across the dataset—is a cause of a great number of phylogenetic artefacts. Whilst a variety of methods can identify it post-hoc, few metrics exist to quantify compositional heterogeneity prior to the computationally intensive task of phylogenetic tree reconstruction. Here we assess the efficacy of one such existing, widely used, metric: Relative Composition Frequency Variability (RCFV), using both real and simulated data.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?