Abstract:Divergence time estimation-the calibration of a phylogeny to geological time-is an integral first step in modeling the tempo of biological evolution (traits and lineages). However, despite increasingly sophisticated methods to infer divergence times from molecular genetic sequences, the estimated age of many nodes across the tree of life contrast significantly and consistently with timeframes conveyed by the fossil record. This is perhaps best exemplified by crown angiosperms, where molecular clock (Triassic) estimates predate the oldest (Early Cretaceous) undisputed angiosperm fossils by tens of millions of years or more. While the incompleteness of the fossil record is a common concern, issues of data limitation and model inadequacy are viable (if underexplored) alternative explanations. In this vein, Beaulieu et al. (2015) convincingly demonstrated how methods of divergence time inference can be misled by both (i) extreme state-dependent molecular substitution rate heterogeneity and (ii) biased sampling of representative major lineages. These results demonstrate the impact of (potentially common) model violations. Here, we suggest another potential challenge: that the configuration of the statistical inference problem (i.e., the parameters, their relationships, and associated priors) alone may preclude the reconstruction of the paleontological timeframe for the crown age of angiosperms. We demonstrate, through sampling from the joint prior (formed by combining the tree (diversification) prior with the calibration densities specified for fossil-calibrated nodes) that with no data present at all, that an Early Cretaceous crown angiosperms is rejected (i.e., has essentially zero probability). More worrisome, however, is that for the 24 nodes calibrated by fossils, almost all have indistinguishable marginal prior and posterior age distributions when employing routine lognormal fossil calibration priors. These results indicate that there is inadequate information in the data to over-rule the joint prior. Given that these calibrated nodes are strategically placed in disparate regions of the tree, they act to anchor the tree scaffold, and so the posterior inference for the tree as a whole is largely determined by the pseudodata present in the (often arbitrary) calibration densities. We recommend, as for any Bayesian analysis, that marginal prior and posterior distributions be carefully compared to determine whether signal is coming from the data or prior belief, especially for parameters of direct interest. This recommendation is not novel. However, given how rarely such checks are carried out in evolutionary biology, it bears repeating. Our results demonstrate the fundamental importance of prior/posterior comparisons in any Bayesian analysis, and we hope that they further encourage both researchers and journals to consistently adopt this crucial step as standard practice. Finally, we note that the results presented here do not refute the biological modeling concerns identified by Beaulieu et al. (2015). Both sets of issues remain apposite to the goals of accurate divergence time estimation, and only by considering them in tandem can we move forward more confidently.

Sequential Bayesian Phylogenetic Inference

Bayesian inference of phylogenetic distances: revisiting the eigenvalue approach

Bayesian Inference of Species Trees from Multilocus Data

Sciphy: A Bayesian phylogenetic framework using sequential genetic lineage tracing data.

The Past Sure is Tense: On Interpreting Phylogenetic Divergence Time Estimates

A simulation approach for change-points on phylogenetic trees

An Annealed Sequential Monte Carlo Method for Bayesian Phylogenetics

Scalable Bayesian divergence time estimation with ratio transformations

Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology

Variational Bayesian Phylogenetic Inference with Semi-implicit Branch Length Distributions

A Variational Approach to Bayesian Phylogenetic Inference

Efficient Bayesian Inference of General Gaussian Models on Large Phylogenetic Trees

Bayesian Inference of Evolutionary Histories under Time-Dependent Substitution Rates

Approximate Bayesian computation for Markovian binary trees in phylogenetics

Bayesian Inference of Sampled Ancestor Trees for Epidemiology and Fossil Calibration

Bayesian Selection of Relaxed-clock Models: Distinguishing Between Independent and Autocorrelated Rates

Accelerating Bayesian inference of dependency between complex biological traits

Bayesian Least-Squares Supertrees (BLeSS): flexible inference of large time-calibrated phylogenies

Efficient Bayesian species tree inference under the multi-species coalescent

tbea: tools for pre- and post-processing in Bayesian evolutionary analyses

An Efficient Bayesian Inference Framework for Coalescent-Based Nonparametric Phylodynamics