wgatools: an ultrafast toolkit for manipulating whole genome alignments

Wenjie Wei,Songtao Gui,Jian Yang,Erik Garrison,Jianbing Yan,Hai-Jun Liu
2024-09-13
Abstract:Summary: With the rapid development of long-read sequencing technologies, the era of individual complete genomes is approaching. We have developed wgatools, a cross-platform, ultrafast toolkit that supports a range of whole genome alignment (WGA) formats, offering practical tools for conversion, processing, statistical evaluation, and visualization of alignments, thereby facilitating population-level genome analysis and advancing functional and evolutionary genomics. Availability and Implementation: wgatools supports diverse formats and can process, filter, and statistically evaluate alignments, perform alignment-based variant calling, and visualize alignments both locally and genome-wide. Built with Rust for efficiency and safe memory usage, it ensures fast performance and can handle large datasets consisting of hundreds of genomes. wgatools is published as free software under the MIT open-source license, and its source code is freely available at <a class="link-external link-https" href="https://github.com/wjwei-handsome/wgatools" rel="external noopener nofollow">this https URL</a>. Contact: weiwenjie@westlake.<a class="link-external link-http" href="http://edu.cn" rel="external noopener nofollow">this http URL</a> (W.W.) or liuhaijun@yzwlab.cn (H.-J.L.).
Genomics
What problem does this paper attempt to address?
The main goal of this paper is to address the issue of incompatibility between Whole Genome Alignment (WGA) formats in the context of the rapid development of long-read sequencing technology. Specifically: - **Main Issue**: With the advancement of long-read sequencing technology, sequencing of individual complete genomes is becoming increasingly common. However, different WGA technologies generate data in various formats (such as MAF, PAF, and Chain), making it difficult to integrate and compare these data across different studies or platforms. - **Solution**: The authors have developed a cross-platform, ultra-fast toolkit named `wgatools`, which supports conversion between multiple WGA formats and provides functionalities for processing, filtering, statistical evaluation, and visualization. This allows researchers to more flexibly analyze and compare genomic data from different sources, thereby promoting the progress of functional genomics and evolutionary genomics research. By providing such an efficient and versatile tool, `wgatools` aims to overcome the limitations of existing tools, enhance the interoperability and accessibility of genomic data, and thus drive collaboration and innovation in the field of genomic research.