Exact Expressions for the Log-likelihood's Hessian in Multivariate Continuous-Time Continuous-Trait Gaussian Evolution along a Phylogeny

Woodrow Hao Chi Kiang
2024-05-13
Abstract:We presents the closed form formulae for the likelihood Hessian matrix of a family of multivariate continuous-trait Gaussian Markov trait evolution model along a given phylogeny, in which the trait vector's mean is an affine function of that of its ancestor and the variance is not dependent of the trait. Accompanied with this work is an R package called 'glinvci', publicly available on The Comprehensive R Archive Network (CRAN), that can compute Hessian-based approximate confidence regions for these models while at the same time allowing users to have missing data, lost traits, and multiple evolutionary regimes.
Populations and Evolution
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to provide a closed - form expression of the second - order derivative (Hessian matrix) of the log - likelihood function in the multi - variable continuous - time continuous - feature Gaussian evolutionary process. These expressions are very important for obtaining confidence ellipses when performing maximum - likelihood estimation on phylogenetic trees, especially when the log - likelihood surface is close to a multi - variable quadratic function. In addition, these recursive mathematical structures are also very useful for further mathematical analysis, especially considering that some recent studies have revealed the statistical properties of these phylogenetic comparative method (PCM) models. Specifically, the following points are discussed in the paper: 1. **Background and Motivation**: The paper first reviews the historical background that features on phylogenetic trees should not be modeled as independent variables and introduces several existing methods for dealing with non - independence, such as Felsenstein's independent contrast algorithm and Hansen's phylogenetic Ornstein - Uhlenbeck model. Subsequently, the paper mentions a linear - time algorithm proposed by Mitov et al. for calculating the likelihood values of a class of models called GLinv, which encompasses a variety of continuous - time continuous - feature Gaussian Markov processes. 2. **Main Contributions**: - Proposed a closed - form expression of the second - order derivative of the log - likelihood function of the GLinv model family. - These Hessian matrices can be used to obtain confidence ellipses for maximum - likelihood estimation. - Developed an R package named glinvci that implements the calculation of these Hessian matrices and provides the functionality of calculating confidence ellipses for BM and OU models. - The paper also introduced a new linear - time algorithm based on the Woodbury formula for calculating the likelihood values of GLinv models. Compared with existing methods, this algorithm does not need to assume that the variance - covariance matrix satisfies specific conditions, but requires a full - rank matrix update. 3. **Technical Details**: - Derived in detail the calculation method of the Hessian matrix, including how to handle missing data and vanishing features. - Paid special attention to the second - order derivative of the Ornstein - Uhlenbeck model and gave specific calculation formulas. 4. **Applications and Significance**: - These closed - form expressions and algorithms are very useful for uncertainty quantification in phylogenetic comparative methods, especially when using maximum - likelihood estimation. - The provided R package glinvci enables researchers to conveniently calculate these complex Hessian matrices, thereby better understanding the uncertainty of model parameters. In summary, this paper significantly enhances the ability to perform continuous - feature evolutionary analysis on phylogenetic trees by providing accurate mathematical tools and implementation methods, especially when dealing with complex models and large - scale data sets.