Finding Maximum Common Contractions Between Phylogenetic Networks

Bertrand Marchand,Nadia Tahiri,Olivier Tremblay-Savard,Manuel Lafond
2024-10-30
Abstract:In this paper, we lay the groundwork on the comparison of phylogenetic networks based on edge contractions and expansions as edit operations, as originally proposed by Robinson and Foulds to compare trees. We prove that these operations connect the space of all phylogenetic networks on the same set of leaves, even if we forbid contractions that create cycles. This allows to define an operational distance on this space, as the minimum number of contractions and expansions required to transform one network into another. We highlight the difference between this distance and the computation of the maximum common contraction between two networks. Given its ability to outline a common structure between them, which can provide valuable biological insights, we study the algorithmic aspects of the latter. We first prove that computing a maximum common contraction between two networks is NP-hard, even when the maximum degree, the size of the common contraction, or the number of leaves is bounded. We also provide lower bounds to the problem based on the Exponential-Time Hypothesis. Nonetheless, we do provide a polynomial-time algorithm for weakly-galled networks, a generalization of galled trees.
Data Structures and Algorithms,Computational Complexity
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **How to find the Maximum Common Contraction (MCC) between phylogenetic networks**. Specifically, the authors explored methods for comparing phylogenetic networks based on edge contraction and expansion operations, and defined two measurement methods: 1. **Contraction - Expansion Distance (dCE)**: This is the minimum number of contraction and expansion operations required to transform one network into another. 2. **MCC Dissimilarity Measure (δMCC)**: This is a dissimilarity measure defined based on the size of the maximum common contraction between two networks. ### Research Background and Motivation Phylogenetic networks are used to represent the history of biological evolution, especially in the presence of complex evolutionary events such as horizontal gene transfer and hybridization. However, phylogenetic networks reconstructed from the same data set by different methods may be different. Therefore, it is necessary to develop effective measurement methods to compare these networks, evaluate their accuracy, and identify similarities and anomalies. The traditional Robinson - Foulds distance is mainly used for tree comparison, but it has limitations when directly applied to networks. For this reason, this paper proposes a new measurement method based on contraction and expansion operations to solve these problems. ### Main Contributions 1. **Proved spatial connectivity**: The authors proved that through allowed contraction and expansion operations, connections can be established between all phylogenetic networks of the same leaf set. 2. **Defined two measurements**: Contraction - Expansion Distance (dCE) and MCC Dissimilarity Measure (δMCC). 3. **Algorithm complexity analysis**: - Proved that the problem of calculating the maximum common contraction is NP - hard, even when the maximum degree of the network, the size of the common contraction, or the number of leaf nodes is limited. - Provided a polynomial - time algorithm for weakly - galled networks. 4. **Provided a theoretical lower bound**: Based on the Exponential Time Hypothesis (ETH), the lower bound of the problem is given. ### Formula Representation The formulas involved in the paper mainly include: - Definitions of contraction operation \( c(u,v,w) \) and expansion operation \( e(u,v,w,X^-,Y^-,Z^-,X^+,Y^+,Z^+) \). - Definition of contraction - expansion distance \( d_{CE}(N_1, N_2) \): \[ d_{CE}(N_1, N_2) = \min\{\text{admissible contractions and expansions transforming } N_1 \text{ into } N_2'\} \] - Definition of MCC dissimilarity measure \( \delta_{MCC}(N_1, N_2) \): \[ \delta_{MCC}(N_1, N_2) = |I(N_1)| + |I(N_2)| - 2|I(M)| \] where \( M \) is the maximum common contraction of \( N_1 \) and \( N_2 \). Through these studies, the authors provided a new and effective tool for the comparison of phylogenetic networks, which is helpful for a deeper understanding of biological evolutionary relationships.