Gfa2bin enables graph-based GWAS by converting genome graphs to pan-genomic genotypes

Sebastian Vorbrugg,Ilja Bezrukov,Zhigui Bao,Wenfei Xian,Detlef Weigel
DOI: https://doi.org/10.1101/2024.12.05.626966
2024-12-09
Abstract:Variation graphs offer superior representation of genomic diversity compared to traditional linear reference genomes, capturing complex features that are otherwise inaccessible to analysis. It seems self-evident that integrating these graphs with genome-wide association studies (GWAS) should enable more comprehensive understanding of genetic landscapes, potentially uncovering novel associations between genetic variations and traits. This approach takes full advantage of rich genomic information, thereby providing deeper insights into the genetic base of complex traits. Our tool, gfa2bin, offers multiple methods to (i) genotype variation graphs and (ii) convert the genotypes to well-established data formats for genome-wide association studies (GWAS). We demonstrate that variation graphs are feasible alternatives to traditional linear references for GWAS. Our case study using Arabidopsis thaliana and 1,695 traits shows that our approach complements SNP-based approaches, often identifying additional associations, with all associations having on average higher significance compared to SNP-based approaches. gfa2bin is implemented in Rust. Commented source code is available under MIT license at https://github.com/MoinSebi/gfa2bin. Examples of how to run gfa2bin are provided in the documentation. We added several Python scripts and a Snakemake pipeline for easy processing of our tool using larger data sets. In addition, we recommend using packing (https://github.com/MoinSebi/packing) for reduced storage and preprocessing (normalization) of sequence-to-graph alignments coverage.
Bioinformatics
What problem does this paper attempt to address?