Abstract:Ancient genomic data is becoming increasingly available thanks to recent advances in high-throughput sequencing technologies. Yet, post-mortem degradation of endogenous ancient DNA often results in low depth of coverage and subsequently high levels of genotype missingness and uncertainty. Genotype imputation is a potential strategy for increasing the information available in ancient DNA samples and thus improving the power of downstream population genetic analyses. However, the performance of genotype imputation on ancient genomes under different conditions has not yet been fully explored, with all previous work primarily using an empirical approach of downsampling high coverage paleogenomes. While these studies have provided invaluable insights into best practices for imputation, they rely on a fairly limited number of existing high coverage samples with significant temporal and geographical biases. As an alternative, we used a coalescent simulation approach to generate genomes with characteristics of ancient DNA in order to more systematically evaluate the performance of two popular imputation software, BEAGLE and GLIMPSE, under variable divergence times between the target sample and reference haplotypes, as well as different depths of coverage and reference sample size. Our results suggest that for genomes with coverage <=0.1x imputation performance is poor regardless of the strategy employed. Beyond 0.1x coverage imputation is generally improved as the size of the reference panel increases, and imputation accuracy decreases with increasing divergence between target and reference populations. It may thus be preferable to compile a smaller set of less diverged reference samples than a larger more highly diverged dataset. In addition, the imputation accuracy may plateau beyond some level of divergence between the reference and target populations. While accuracy at common variants is similar regardless of divergence time, rarer variants are better imputed on less diverged target samples. Furthermore, both imputation software, but particularly GLIMPSE, overestimate high genotype probability calls, especially at low coverages. Our results provide insight into optimal strategies for ancient genotype imputation under a wide set of scenarios, complementing previous empirical studies based on imputing downsampled high-coverage ancient genomes.

Unravelling reference bias in ancient DNA datasets

Estimating allele frequencies, ancestry proportions and genotype likelihoods in the presence of mapping bias

Benchmarking software tools for trimming adapters and merging next-generation sequencing data for ancient DNA

Evaluation of ancient DNA imputation: a simulation study

A Comprehensive Evaluation of Alignment Software for Reduced Representation Bisulfite Sequencing Data

Read Annotation Pipeline for High-Throughput Sequencing Data.

A Refined Analysis of Neanderthal-Introgressed Sequences in Modern Humans with a Complete Reference Genome

Mitochondrial DNA Consensus Calling and Quality Filtering for Constructing Ancient Human Mitogenomes: Comparison of Two Widely Applied Methods

Assessing alignment-based taxonomic classification of ancient microbial DNA

Minimizing Reference Bias with an Impute-First Approach

DnaSAM: Software to perform neutrality testing for large datasets with complex null models

A Novel Multi-Alignment Pipeline for High-Throughput Sequencing Data.

SAFARI: Pangenome Alignment of Ancient DNA Using Purine/Pyrimidine Encodings

Assessing the impact of post-mortem damage and contamination on imputation performance in ancient DNA

Too many needles in this haystack: algorithms for the analysis of next generation sequence data

An open-sourced bioinformatic pipeline for the processing of Next-Generation Sequencing derived nucleotide reads: Identification and authentication of ancient metagenomic DNA

Measuring, visualizing and diagnosing reference bias with biastools

Read Length Dominates Phylogenetic Placement Accuracy of Ancient DNA Reads

Placing Ancient DNA Sequences into Reference Phylogenies

Valibs: A Visual Aligner for Bisulfite Sequences

READv2: Advanced and user-friendly detection of biological relatedness in archaeogenomics