DateLife: leveraging databases and analytical tools to reveal the dated Tree of Life

Luna L Sánchez Reyes,Emily Jane McTavish,Brian O’Meara
DOI: https://doi.org/10.1093/sysbio/syae015
IF: 9.16
2024-03-21
Systematic Biology
Abstract:Chronograms –phylogenies with branch lengths proportional to time– represent key data on timing of evolutionary events for the study of natural processes in many areas of biological research. Chronograms also provide valuable information that can be used for education, science communication, and conservation policy decisions. Yet, achieving a high-quality reconstruction of a chronogram is a difficult and resource-consuming task. Here we present DateLife, a phylogenetic software implemented as an R package and an R Shiny web application available at www .datelife.org, that provides services for efficient and easy discovery, summary, reuse, and reanalysis of node age data mined from a curated database of expert, peer-reviewed, and openly available chronograms. The main DateLife workflow starts with one or more scientific taxon names provided by a user. Names are processed and standardized to a unified taxonomy, allowing DateLife to run a name match across its local chronogram database that is curated from Open Tree of Life's phylogenetic repository, and extract all chronograms that contain at least two queried taxon names, along with their metadata. Finally, node ages from matching chronograms are mapped using the congruification algorithm to corresponding nodes on a tree topology, either extracted from Open Tree of Life's synthetic phylogeny or one provided by the user. Congruified node ages are used as sec- ondary calibrations to date the chosen topology, with or without initial branch lengths, using different phylogenetic dating methods such as BLADJ, treePL, PATHd8 and MrBayes. We performed a cross-validation test to compare node ages resulting from a DateLife analysis (i.e, phylogenetic dating using secondary calibrations) to those from the original chronograms (i.e, obtained with primary calibrations), and found that DateLife's node age estimates are consistent with the age estimates from the original chronograms, with the largest variation in ages occurring around topologically deeper nodes. Because the results from any software for scientific analysis can only be as good as the data used as input, we highlight the importance of considering the results of a DateLife analysis in the context of the input chronograms. DateLife can help to increase awareness of the existing disparities among alternative hypotheses of dates for the same diversification events, and to support exploration of the effect of alternative chronogram hypotheses on downstream analyses, providing a framework for a more informed interpretation of evolutionary results.
evolutionary biology
What problem does this paper attempt to address?
This paper presents a solution to the problem of constructing chronograms in evolutionary biology research. Chronograms are crucial data in studying natural processes such as comparative analysis, developmental biology, and conservation biology, but building high-quality chronograms is both difficult and resource-consuming. DateLife is a software tool based on the R package and R Shiny web application. It extracts and integrates node age data from expert-reviewed and publicly available chronogram databases to facilitate discovery, summarization, reuse, and reanalysis. The workflow of DateLife involves user input of scientific species names, searching for matching chronograms in the database, and summarizing the results. It uses standardized classification criteria to process input names, matches them with the Open Tree of Life database, maps node ages onto the topology of the tree, and calibrates them using different phylogenetic dating methods. Through cross-validation, DateLife's node age estimation is consistent with the original chronograms. DateLife helps improve understanding of the differences between hypotheses for the dates of the same diversification events and supports exploration of the impact of alternative chronogram hypotheses in downstream analyses, providing a framework for interpreting evolutionary outcomes more wisely.