Meta-Prism 2.0: Enabling Algorithm and Web Server for Ultra-Fast, Memory-Efficient, and Accurate Analysis among Millions of Microbial Community Samples

Kai Kang,Hui Chong,Kang Ning
DOI: https://doi.org/10.1093/gigascience/giac073
IF: 7.658
2022-01-01
GigaScience
Abstract:Background Microbial community samples have been accumulating at a speed faster than ever, with hundreds of thousands of samples been sequenced each year. Mining such a huge amount of multisource heterogeneous data is becoming an increasingly difficult challenge, so efficient and accurate compare and search of samples is in urgent need: faced with millions of samples in the data repository, traditional sample comparison and search approaches fall short in speed and accuracy. Findings Here we proposed Meta-Prism 2.0, a microbial community sample analysis method that has pushed the time and memory efficiency to a new limit without compromising accuracy. Based on sparse data structure, time-saving instruction pipeline, and SIMD optimization, Meta-Prism 2.0 has enabled ultra-fast, memory-efficient, flexible, and accurate search among millions of samples. Meta-Prism 2.0 was put to test on several data sets, with the largest containing 1 million samples. Results show that Meta-Prism 2.0's 0.00001-s per sample pair compare speed and 8-GB memory needs for searching against 1 million samples have made it one of the most efficient sample analysis methods. Additionally, Meta-Prism 2.0 can achieve accuracy comparable with or better than other contemporary methods. Third, Meta-Prism 2.0 can precisely identify the original biome for samples, thus enabling sample source tracking. Finally, we have provided a web server for fast search of microbial community samples online. Conclusions In summary, Meta-Prism 2.0 has changed the resource-intensive sample search scheme to an effective procedure, which could be conducted by researchers every day even on a laptop, for insightful sample search, similarity analysis, and knowledge discovery. Meta-Prism 2.0 can be accessed at https://github.com/HUST-NingKang-Lab/Meta-Prism-2.0, and the web server can be accessed at https://hust-ningkang-lab.github.io/Meta-Prism-2.0/.
What problem does this paper attempt to address?