A forest is more than its trees: haplotypes and inferred ARGs

Halley Fritze,Nathaniel Pope,Jerome Kelleher,Peter L. Ralph
DOI: https://doi.org/10.1101/2024.11.30.626138
2024-12-02
Abstract:Foreshadowing haplotype-based methods of the genomics era, it is an old observation that the ``junction'' between two distinct haplotypes produced by recombination is inherited as a Mendelian marker. In this paper, we describe how this recombination-mediated information can in many cases be recovered from inference based solely on polymorphic markers. In a genealogical context, this information reflects the persistence of ancestral haplotypes across local genealogical trees in which they do not represent coalescences. We show how these non-coalescing haplotypes (``unary regions'') may be inserted into ancestral recombination graphs (ARGs), a compact but information-rich data structure describing the genealogical relationships among recombinant sequences. The resulting ARGs are smaller, faster to compute with, and the additional ancestral information that is inserted is nearly always correct where the initial ARG is correct. We provide efficient algorithms to infer unary regions within existing ARGs and explore some consequences for ARGs inferred from real data. To do this, we introduce new metrics of agreement and disagreement between ARGs that, unlike previous methods, consider ARGs as describing relationships between haplotypes rather than just a collection of trees.
Genomics
What problem does this paper attempt to address?