Can we Restore Balance to Geometric Morphometrics? A Theoretical Evaluation of how Sample Imbalance Conditions Ordination and Classification

Lloyd A. Courtenay
DOI: https://doi.org/10.1007/s11692-022-09590-0
2022-12-31
Evolutionary Biology
Abstract:The most common means of performing ordination and classification consist in principal component, canonical variate, and between-group principal component analysis (PCA, CVA & bgPCA) for ordination, and linear and partial least squares discriminant analysis (LDA & PLSDA) for classification. Over the years, research has shown how the number of variables used in Geometric Morphometrics can be problematic for studies using small sample sizes. In the case of ordination, this implies an inflation of differences between groups, even when no differences are present. In light of this, classification tasks should also theoretically present exaggerated accuracy scores. Using a theoretically constructed geometric experiment, the present study constructs a series of imbalanced theoretical datasets containing different degrees of variation in both shape and form. Each ordination and classification task is then carried out to observe how imbalance influences the quality of results. Even when using large enough sample sizes, if one sample is considerably smaller than another, then this imbalance will have an effect on both ordination and classification results. Imbalance is thus seen to force separation among samples, and a considerable loss in classification performance. Statistical tests such as Procrustes distance calculations are not affected. The conclusions suggest that prior dimensionality reduction such as PCA are necessary for CVA, bgPCA, LDA and PLSDA. Cross-validated versions of these algorithms should also be used. An extensive discussion is also provided into alternative ordination and classification techniques that could prove useful for Geometric Morphometrics, and that are less sensitive to sample imbalance.
evolutionary biology
What problem does this paper attempt to address?