SemiBin: Incorporating Information from Reference Genomes with Semi-Supervised Deep Learning Leads to Better Metagenomic Assembled Genomes (Mags)

Shaojun Pan,Chengkai Zhu,Xing-Ming Zhao,Luis Pedro Coelho
DOI: https://doi.org/10.1101/2021.08.16.456517
IF: 16.6
2021-01-01
Nature Communications
Abstract:Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped together. The most widely-used methods for binning are reference-independent, operating de novo and allow the recovery of genomes from previously unsampled clades. However, they do not leverage the knowledge in existing databases. Here, we propose SemiBin, an open source tool that uses neural networks to implement a semi-supervised approach, i.e. SemiBin exploits the information in reference genomes, while retaining the capability of binning genomes that are outside the reference dataset. SemiBin outperforms existing state-of-the-art binning methods in simulated and real microbiome datasets across three different environments (human gut, dog gut, and marine microbiomes). SemiBin returns more high-quality bins with larger taxonomic diversity, including more distinct genera and species. SemiBin is available as open source software at . ### Competing Interest Statement The authors have declared no competing interest.
What problem does this paper attempt to address?