Abstract:Whole-genome sequencing data allow survey of variation from across the genome, reducing the constraint of balancing genome sub-sampling with estimating recombination rates and linkage between sampled markers and target loci. As sequencing costs decrease, low-coverage whole-genome sequencing of pooled or indexed-individual samples is commonly utilized to identify loci associated with phenotypes or environmental axes in non-model organisms. There are, however, relatively few publicly available bioinformatic pipelines designed explicitly to analyse these types of data, and fewer still that process the raw sequencing data, provide useful metrics of quality control and then execute analyses. Here, we present an updated version of a bioinformatics pipeline called PoolParty2 that can effectively handle either pooled or indexed DNA samples and includes new features to improve computational efficiency. Using simulated data, we demonstrate the ability of our pipeline to recover segregating variants, estimate their allele frequencies accurately, and identify genomic regions harbouring loci under selection. Based on the simulated data set, we benchmark the efficacy of our pipeline with another bioinformatic suite, angsd, and illustrate the compatibility and complementarity of these suites using angsd to generate genotype likelihoods as input for identifying linkage outlier regions using alignment files and variants provided by PoolParty2. Finally, we apply our updated pipeline to an empirical dataset of low-coverage whole genomic data from population samples of Columbia River steelhead trout (Oncorhynchus mykiss), results from which demonstrate the genomic impacts of decades of artificial selection in a prominent hatchery stock. Thus, we not only demonstrate the utility of PoolParty2 for genomic studies that combine sequencing data from multiple individuals, but also illustrate how it compliments other bioinformatics resources such as angsd.

Population assignment from genotype likelihoods for low‐coverage whole‐genome sequencing data

A beginner's guide to low‐coverage whole genome sequencing for population genomics

Estimating heterozygosity from a low-coverage genome sequence, leveraging data from other individuals sequenced at the same sites

Large-scale Genotyping of Complex DNA

Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

An evaluation of sequencing coverage and genotyping strategies to assess neutral and adaptive diversity

Efficient storage and regression computation for population-scale genome sequencing studies

Variant Calling in Low-Coverage Whole Genome Sequencing of a Native American Population Sample

Biostatistical Aspects of Whole Genome Sequencing Studies: Preprocessing and Quality Control

Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples

PoolParty2: An integrated pipeline for analysing pooled or indexed low-coverage whole-genome sequencing data to discover the genetic basis of diversity

An efficient and robust method for analyzing population pharmacokinetic data in genome-wide pharmacogenomic studies: a generalized estimating equation approach

Fine Population Structure Analysis Method for Genomes of Many

Accurate genotype imputation from low-coverage whole-genome sequencing data of rainbow trout

Extending Rare-Variant Testing Strategies: Analysis of Noncoding Sequence and Imputed Genotypes

Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data

Genome Wide Sampling Sequencing for SNP Genotyping: Methods, Challenges and Future Development

Abcd: Arbitrary Coverage Design for Sequencing-Based Genetic Studies

Predicting the Number of Bases to Attain Sufficient Coverage in High-Throughput Sequencing Experiments.

Genotyping-by-Sequencing for Populus Population Genomics: An Assessment of Genome Sampling Patterns and Filtering Approaches