KCOSS: an ultra-fast k-mer counter for assembled genome analysis
Deyou Tang,Yucheng Li,Daqiang Tan,Juan Fu,Yelei Tang,Jiabin Lin,Rong Zhao,Hongli Du,Zhongming Zhao
DOI: https://doi.org/10.1093/bioinformatics/btab797
IF: 5.8
2021-11-26
Bioinformatics
Abstract:Abstract Motivation The k-mer frequency in whole genome sequences provides researchers with an insightful perspective on genomic complexity, comparative genomics, metagenomics and phylogeny. The current k-mer counting tools are typically slow, and they require large memory and hard disk for assembled genome analysis. Results We propose a novel and ultra-fast k-mer counting algorithm, KCOSS, to fulfill k-mer counting mainly for assembled genomes with segmented Bloom filter, lock-free queue, lock-free thread pool and cuckoo hash table. We optimize running time and memory consumption by recycling memory blocks, merging multiple consecutive first-occurrence k-mers into C-read, and writing a set of C-reads to disk asynchronously. KCOSS was comparatively tested with Jellyfish2, CHTKC and KMC3 on seven assembled genomes and three sequencing datasets in running time, memory consumption, and hard disk occupation. The experimental results show that KCOSS counts k-mer with less memory and disk while having a shorter running time on assembled genomes. KCOSS can be used to calculate the k-mer frequency not only for assembled genomes but also for sequencing data. Availabilityand implementation The KCOSS software is implemented in C++. It is freely available on GitHub: https://github.com/kcoss-2021/KCOSS. Supplementary information Supplementary data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology