Abstract:Despite the dramatic decrease in high-throughput sequencing costs over time, sequencing the ideal number of individuals for population genetic inference remains prohibitively expensive. When research questions require only population-level resolution, pooling individual samples before sequencing (pool-seq) can substantially reduce costs while still providing allele frequencies of Single Nucleotide Polymorphisms (SNPs). However, analyzing pooled data is comparatively difficult and less standardized than individual-based analyses. Although several programs have been developed to handle pool-seq data, most require extensive formatting or programming skills to operate. Here we introduce assessPool, an open-source R and Bash pipeline for pool-seq analyses with a focus on population structure. AssessPool accepts a Variant-Call Format (VCF) file and a FASTA-formatted reference, providing a straightforward transition from commonly used pipelines such as Stacks or dDocent. AssessPool handles varying numbers of pools and utilizes PoPoolation2 to generate locus-by-locus pairwise F values and associated Fisher T-test values as measures of population structure. Starting with a VCF file containing all identified SNPs, assessPool facilitates several key functionalities for population genetic analyses: i) filtering SNPs based on adjustable criteria with parameter suggestions for pool-seq data, ii) organizing data structures for analysis based on input pools, iii) creating customizable run scripts for FST calculations using PoPoolation2 and/or the {poolfstat} R package, for all pairwise comparisons, iv) calculating locus-specific F values using PoPoolation2 and/or {poolfstat}, v) importing F output into a format compatible with R, vi) producing population genomic summary statistics, and vii) generating interactive plots to visualize and explore data. A pooled dataset generated from wild populations is used here to showcase the features of the assessPool pipeline for population genomic analyses.

PoolParty2: An integrated pipeline for analysing pooled or indexed low-coverage whole-genome sequencing data to discover the genetic basis of diversity

assessPool: a fexible pipeline for population genomic analyses of pooled sequencing data

Integrating Pool-seq uncertainties into demographic inference

Reconstruction of Microbial Haplotypes by Integration of Statistical and Physical Linkage in Scaffolding

SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets

grenedalf: population genetic statistics for the next generation of pool sequencing

PAPipe: a pipeline for comprehensive population genetic analysis

Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples.

Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data

Genotyping single nucleotide polymorphisms and inferring ploidy by amplicon sequencing for polyploid, ploidy‐variable organisms

Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples

diverse-seq: an application for alignment-free selecting and clustering biological sequences

Robust and cost-efficient single-cell sequencing through combinatorial pooling

Estimating hierarchical F–statistics from Pool–Seq data

A Novel Multi-Alignment Pipeline for High-Throughput Sequencing Data.

V-pipe 3.0: a sustainable pipeline for within-sample viral genetic diversity estimation

Stacks: an analysis tool set for population genomics

Ingap: an Integrated Next-Generation Genome Analysis Pipeline

Population assignment from genotype likelihoods for low‐coverage whole‐genome sequencing data

Metapipeline-DNA: A Comprehensive Germline & Somatic Genomics Nextflow Pipeline

DNA Hash Pooling and its Applications