Split-or-decompose: Improved FPT branching algorithms for maximum agreement forests

David Mestel,Steven Chaplick,Steven Kelk,Ruben Meuwese
2024-09-27
Abstract:Phylogenetic trees are leaf-labelled trees used to model the evolution of species. In practice it is not uncommon to obtain two topologically distinct trees for the same set of species, and this motivates the use of distance measures to quantify dissimilarity. A well-known measure is the maximum agreement forest (MAF): a minimum-size partition of the leaf labels which splits both trees into the same set of disjoint, leaf-labelled subtrees (up to isomorphism after suppressing degree-2 vertices). Computing such a MAF is NP-hard and so considerable effort has been invested in finding FPT algorithms, parameterised by $k$, the number of components of a MAF. The state of the art has been unchanged since 2015, with running times of $O^*(3^k)$ for unrooted trees and $O^*(2.3431^k)$ for rooted trees. In this work we present improved algorithms for both the unrooted and rooted cases, with runtimes $O^*(2.846^k)$ and $O^*(2.3391^k)$ respectively. The key to our improvement is a novel branching strategy in which we show that any overlapping components obtained on the way to a MAF can be `split' by a branching rule with favourable branching factor, and then the problem can be decomposed into disjoint subproblems to be solved separately. We expect that this technique may be more widely applicable to other problems in algorithmic phylogenetics.
Data Structures and Algorithms,Populations and Evolution
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the efficiency of the fixed - parameter tractable (FPT) branching algorithm for computing the Maximum Agreement Forest (MAF). Specifically, for the maximum - agreement - forest problems of unrooted trees and rooted trees (uMAF and rMAF), the authors propose a new "split - or - decompose" technique to improve the running time of existing algorithms. ### Problem Background In phylogenetics, phylogenetic trees are used to model the evolutionary relationships of species. In practice, two phylogenetic trees with different topological structures may be obtained for the same set of species, which prompts people to use distance metrics to quantify the differences between these trees. A commonly used metric is the Maximum Agreement Forest (MAF), which is a leaf - label partition of the minimum size that can divide two trees into the same disjoint, leaf - labeled subtree sets (isomorphic after suppressing vertices of degree 2). However, computing such an MAF is an NP - hard problem, so researchers are committed to finding efficient FPT algorithms. ### Existing Methods and Their Limitations As of 2015, the running times of the state - of - the - art MAF algorithms for unrooted trees and rooted trees are \(O^*(3^k)\) and \(O^*(2.3431^k)\) respectively, where \(k\) is the number of components of the MAF. These algorithms construct the MAF by gradually cutting the edges in one tree and maintain a forest composed of parts of the other tree at each step. The key to the algorithm lies in choosing appropriate branching rules to ensure that the branching factor after each cut is as small as possible. ### New Method: Split - or - Decompose Technique The authors propose a new branching strategy, namely the "split - or - decompose" technique. Its core idea is that during the construction of the MAF, if it is found that some components overlap in the other tree, they can be "split" by a branching rule with a favorable branching factor, and then the problem can be decomposed into multiple independent sub - problems for separate solution. The keys to this method are: - **Splitting Overlapping Components**: When the components in the forest overlap in the other tree, apply a branching rule with a branching factor of 2 for splitting. - **Decomposing into Disjoint Components**: Once all components no longer overlap, each component can be regarded as an independent sub - problem for solution. ### Improvement Effects Through this new method, the authors have achieved faster FPT branching algorithms, with running times of \(O^*(2.846^k)\) and \(O^*(2.3391^k)\) for unrooted trees and rooted trees respectively. Although the improvement for rooted trees is small, the improvement for unrooted trees is more significant. ### Summary The main contribution of this paper is to propose a new "split - or - decompose" technique, which can effectively reduce the branching factor and thus significantly improve the efficiency of computing the maximum - agreement - forest. This technique is not only applicable to the maximum - agreement - forest problems of unrooted trees and rooted trees, but may also be widely applied to other algorithmic - biology problems.