Abstract:Motivation: Alignment of reads to a reference genome sequence is one of the key steps in the analysis of human whole-genome sequencing data obtained through Next-generation sequencing (NGS) technologies. The quality of the subsequent steps of the analysis, such as the results of clinical interpretation of genetic variants or the results of a genome-wide association study, depends on the correct identification of the position of the read as a result of its alignment. The amount of human NGS whole-genome sequencing data is constantly growing. There are a number of human genome sequencing projects worldwide that have resulted in the creation of large-scale databases of genetic variants of sequenced human genomes. Such information about known genetic variants can be used to improve the quality of alignment at the read alignment stage when analysing sequencing data obtained for a new individual, for example, by creating a genomic graph. While existing methods for aligning reads to a linear reference genome have high alignment speed, methods for aligning reads to a genomic graph have greater accuracy in variable regions of the genome. The development of a read alignment method that takes into account known genetic variants in the linear reference sequence index allows combining the advantages of both sets of methods. Results: In this paper, we present the minimap2_index_modifier tool, which enables the construction of a modified index of a reference genome using known single nucleotide variants and insertions/deletions (indels) specific to a given human population. The use of the modified minimap2 index improves variant calling quality without modifying the bioinformatics pipeline and without significant additional computational overhead. Using the PrecisionFDA Truth Challenge V2 benchmark data (for HG002 short-read data aligned to the GRCh38 linear reference (GCA_000001405.15) with parameters k = 27 and w = 14) it was demonstrated that the number of false negative genetic variants decreased by more than 9500, and the number of false positives decreased by more than 7000 when modifying the index with genetic variants from the Human Pangenome Reference Consortium.

Personalized pangenome references

Personalizing pangenome graphs with k -mers

Minimizing Reference Bias with an Impute-First Approach

From the reference human genome to human pangenome: Premise, promise and challenge

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Building pangenome graphs

PanKmer: k-mer-based and reference-free pangenome analysis

Pangenome graph construction from genome alignments with Minigraph-Cactus

Integer programming framework for pangenome-based genome inference

Unbiased pangenome graphs

Pangenome graphs improve the analysis of structural variants in rare genetic diseases

Haplotype-aware pantranscriptome analyses using spliced pangenome graphs

Proteogenomics analysis of human tissues using pangenomes

Building a pangenome alignment index via recursive prefix-free parsing

Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis

Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index

A draft human pangenome reference

PPanG: a precision pangenome browser enabling nucleotide-level analysis of genomic variations in individual genomes and their graph-based pangenome

Haplotype-aware sequence alignment to pangenome graphs

The Human Pangenome Project: a global resource to map genomic diversity