Personalizing pangenome graphs with k -mers

Wei Li
DOI: https://doi.org/10.1038/s41588-024-01954-w
IF: 30.8
2024-10-11
Nature Genetics
Abstract:Pangenome graphs are commonly used as references for read mapping. To improve the mapping accuracy, Sirén et al. proposed a k -mer-based approach for sampling haplotypes that were subsequently used to build a personalized subgraph. The original pangenome graph was partitioned into nonoverlapping blocks, and the local haplotypes were labeled with graph-unique k -mers. Based on the k -mer counts in the reads, the authors were able to classify k -mers in the matrices as present (heterozygous or homozygous) or absent, and select relevant haplotypes in each block accordingly. The sampled haplotypes led to construction of a personalized variation graph, which is actually a subgraph of the original graph. The haplotype sampling approach is available as part of the vg toolkit and applied to pangenome graphs from the Human Pangenome Reference Consortium. Compared with a frequency-filtered graph, the personalized subgraph with k -mer-based haplotype sampling is a superior reference for read mapping. It reduces genotyping errors and improves the accuracy in calling small variants and genotyping structural variants, suggesting future directions of optimizing methods for personalizing pangenome references. Original reference: Nat. Methods https://doi.org/10.1038/s41592-024-02407-2 (2024)
genetics & heredity
What problem does this paper attempt to address?