Abstract:The Neighbor-Joining (NJ) algorithm is a widely used method for constructing phylogenetic trees from genetic distances. While NJ is known to perform well with tree-like data, its behavior under admixture remains understudied. In this work, we present a geometric framework for analyzing the NJ algorithm under a linear admixture model. We focus on three key properties related to clustering order, distance, and topological path length in the resulting NJ trees involving five taxa. Our approach leverages polyhedral geometry to define NJ cones, which correspond to distinct cherry-picking orders and partition the space of dissimilarity vectors. We project dissimilarity vectors with admixture into a lower-dimensional space without admixture, defining polyhedral regions induced by NJ cones that satisfy specified properties. We compute the exact probabilities that these properties hold by directly calculating the volumes of the induced NJ cones and compare them with Monte Carlo integration and standard NJ simulation methods. Our results show that the property on clustering order is always satisfied, while the other properties are highly probable but depend on the admixture fraction. We also prove that certain induced NJ cones have zero volume, indicating that the corresponding NJ tree topologies are infeasible under admixture. We have implemented our methods as a publicly available NeighborJoining within Macaulay2, providing an efficient tool for analyzing NJ cones and their properties. This work provides new insights into the geometric structure inherent to the NJ algorithm in the presence of admixture, identifying the conditions under which admixture influences the resulting phylogenetic trees.

Scaling neighbor joining to one million taxa with dynamic and heuristic neighbor joining

Relaxed Neighbor Joining: A Fast Distance-Based Phylogenetic Tree Construction Method

DecentTree: scalable Neighbour-Joining for the genomic era

Recrafting the neighbor-joining method

On the optimality of the neighbor-joining algorithm

Efficiency of the Neighbor-Joining Method in Reconstructing Deep and Shallow Evolutionary Relationships in Large Phylogenies

Admixed populations in the neighbor-joining algorithm: a geometric analysis with five taxa

Species tree estimation using Neighbor Joining

Scalable distance-based phylogeny inference using divide-and-conquer

The neighbor-joining method: a new method for reconstructing phylogenetic trees.

The Geometry of the Neighbor-Joining Algorithm for Small Trees

Random Local Neighbor Joining: a New Method for Reconstructing Phylogenetic Trees.

Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering

Combinatorial and computational investigations of Neighbor-Joining bias

THE ACCURACY OF PHYLOGENETIC ESTIMATION USING THE NEIGHBOR‐JOINING METHOD

Tree reconstruction guarantees from CRISPR-Cas9 lineage tracing data using Neighbor-Joining

RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference

The Neighbor-Net Algorithm

ParGenes: a tool for massively parallel model selection and phylogenetic tree inference on thousands of genes

Sequential importance sampling for multi-resolution Kingman-Tajima coalescent counting

BATCH-SCAMPP: Batch Scaled Phylogenetic Placement Large Trees