Abstract:Classical palaeoenvironmental reconstruction models often incorporate biological ideas and commonly assume that the taxa comprising a fossil assemblage exhibit unimodal response functions of the environmental variable of interest. In contrast, machine-learning approaches do not rely upon any biological assumptions but instead need training with large data sets to extract some understanding of the relationships between biological assemblages and their environment. To explore the relative merits of these two approaches, we have developed a two-layered machine-learning reconstruction model MEMLM (Multi Ensemble Machine Learning Model). The first layer applies three different ensemble machine-learning models (random forests, extra random trees, and LightGBM), trained on the modern taxon assemblage and associated environmental data to make reconstructions based on the three different models, while the second layer uses multiple linear regression to integrate these three reconstructions into a consensus reconstruction. We considered three versions of the model: (1) a standard version of MEMLM, which uses only taxon abundance data; (2) MEMLMe, which uses only dimensionally reduced assemblage information, using a natural language-processing model (GloVe), to detect associations between taxa across the training data set; and (3) MEMLMc which incorporates both raw taxon abundance and dimensionally reduced summary (GloVe) data. We trained these MEMLM model variants with three high-quality diatom and pollen training sets and compared their reconstruction performance with three weighted-averaging (WA) approaches (WA-Cla for classical deshrinking, WA-Inv for inverse deshrinking, and WA-PLS for partial least squares). In general, the MEMLM approaches, even when trained on only dimensionally reduced assemblage data, performed substantially better than the WA approaches in the larger training sets, as judged by cross-validatory prediction error. When applied to fossil data, MEMLM variants sometimes generated qualitatively different palaeoenvironmental reconstructions from each other and from reconstructions based on WA approaches. We applied a statistical significance test to all the reconstructions. This successfully identified each incidence for which the reconstruction is not robust with respect to the model choice. We found that machine-learning approaches could outperform classical approaches but could sometimes fail badly in the reconstruction, despite showing high performance under cross-validation, likely indicating problems when extrapolation occurs. We found that the classical approaches are generally more robust, although they could also generate reconstructions which have modest statistical significance and therefore may be unreliable. Given these conclusions, we consider that cross-validation is not a sufficient measure of transfer function performance, and we recommend that the results of statistical significance tests are provided alongside the downcore reconstructions based on fossil assemblages.

Assessing the Adequacy of Morphological Models using Posterior Predictive Simulations

Assessing the Adequacy of Morphological Models used in Palaeobiology

Assessing the impact of character evolution models on phylogenetic and macroevolutionary inferences from fossil data

Evaluating the Performance of Probabilistic Algorithms for Phylogenetic Analysis of Big Morphological Datasets: A Simulation Study

On the Mkv Model with Among-Character Rate Variation

Morphological Datasets Fit a Common Mechanism Much More Poorly than DNA Sequences and Call Into Question the Mkv Model.

Performance of tree-building methods using a morphological dataset and a well-supported Hexapoda phylogeny

Probabilistic methods surpass parsimony when assessing clade support in phylogenetic analyses of discrete morphological data

Automatic generation of evolutionary hypotheses using mixed Gaussian phylogenetic models

Infinite Mixture Models for Improved Modeling of Across-Site Evolutionary Variation

Inferring ancestral states without assuming neutrality or gradualism using a stable model of continuous character evolution

Automatic Discovery of Optimal Discrete Character Models

Bayesian Selection of Relaxed-clock Models: Distinguishing Between Independent and Autocorrelated Rates

A discrete character evolution model for phylogenetic comparative biology with Γ-distributed rate heterogeneity among branches of the tree

Commonly used Bayesian diversification methods lead to biologically meaningful differences in branch-specific rates on empirical phylogenies

Do morphometric data improve phylogenetic reconstruction? A systematic review and assessment

Expectation-Maximization enables Phylogenetic Dating under a Categorical Rate Model

Quantitative Models for Distinguishing Punctuated and Continuous-Time Models of Character Evolution and Their Implications for Macroevolutionary Theory

Accuracy of phylogenetic reconstructions from continuous characters analyzed under parsimony and its parametric correlates

The Efficacy of Consensus Tree Methods for Summarizing Phylogenetic Relationships from a Posterior Sample of Trees Estimated from Morphological Data

Can machine-learning algorithms improve upon classical palaeoenvironmental reconstruction models?