Defining binary phylogenetic trees using parsimony: new bounds

Mirko Wilde,Mareike Fischer
2023-07-28
Abstract:Phylogenetic trees are frequently used to model evolution. Such trees are typically reconstructed from data like DNA, RNA, or protein alignments using methods based on criteria like maximum parsimony (amongst others). Maximum parsimony has been assumed to work well for data with only few state changes. Recently, some progress has been made to formally prove this assertion. For instance, it has been shown that each binary phylogenetic tree $T$ with $n \geq 20k$ leaves is uniquely defined by the set $A_k(T)$, which consists of all characters with parsimony score $k$ on $T$. In the present manuscript, we show that the statement indeed holds for all $n \geq 4k$, thus drastically lowering the lower bound for $n$ from $20k$ to $4k$. However, it has been known that for $n \leq 2k$ and $k \geq 3$, it is not generally true that $A_k(T)$ defines $T$. We improve this result by showing that the latter statement can be extended from $n \leq 2k$ to $n \leq 2k+2$. So we drastically reduce the gap of values of $n$ for which it is unknown if trees $T$ on $n$ taxa are defined by $A_k(T)$ from the previous interval of $[2k+1,20k-1]$ to the interval $[2k+3,4k-1]$. Moreover, we close this gap completely for the nearest neighbor interchange (NNI) neighborhood of $T$ in the following sense: We show that as long as $n\geq 2k+3$, no tree that is one NNI move away from $T$ (and thus very similar to $T$) shares the same $A_k$-alignment.
Populations and Evolution,Combinatorics
What problem does this paper attempt to address?