Abstract:Rapidly developing next-generation sequencing technologies significantly promote metagenomics research, yet also present extreme challenges in the analysis of metagenomic data. Metagenomic samples can contain thousands of microbial species, thus, sequencing datasets can contain fragments from thousands of different genomes. Therefore, clustering the sequencing reads with their original genomes, namely, binning, is usually done to expedite further studies. Currently, binning methods are divided into two categories: supervised methods (which require reference genomes), and unsupervised methods (which do not).We present an unsupervised binning method that combines a novel sequence feature recognition method with a spectral clustering algorithm. The sequence feature is a hybrid of sequence correlation and sequence composition analyses. Simulation experiments, based on simulated and actual metagenomic datasets, suggest that the combination of sequence composition and an intrinsic correlation of oligonucleotides, both extracted from tetranucleotide analyses, performs better than any single feature. A spectral clustering algorithm, which is a high performance unsupervised clustering method, is also applied in our binning method. The method is available as an open source package called HSS-bin (Hybrid Sequence feature and Spectral clustering unsupervised metagenomic binning) at http://bioinfo.seu.edu.cn/HSS-bin/.We evaluated HSS-bin's performance using both simulated and actual metagenomic datasets. Experimental results indicate that HSS-bin can handle metagenomic sequencing data with non-uniform species abundance, short sequences, and complex phylogenetic diversity with high accuracy. Our method performs well on actual metagenomic datasets and on datasets simulated from a complex metagenomic community.

SMeta, a Binning Tool Using Single-Cell Sequences to Aid Reconstructing Metageome Species Accurately

SMeta, a binning tool using single-cell sequences to aid in reconstructing species from metagenome accurately

MetaBAT 2: an Adaptive Binning Algorithm for Robust and Efficient Genome Reconstruction from Metagenome Assemblies

Single Cell Mass Spectrometry with a Robotic Micromanipulation System for Cell Metabolite Analysis

MetaBinner: a High-Performance and Stand-Alone Ensemble Binning Method to Recover Individual Genomes from Complex Microbial Communities

Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning

MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms

SimpleMetaPipeline: Breaking the bioinformatics bottleneck in metabarcoding

A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments

Binette: a fast and accurate bin refinement tool to construct high quality Metagenome Assembled Genomes

A New Unsupervised Binning Approachfor Metagenomic Sequences Based onN-grams and Automatic Feature Weighting.

SemiBin: Incorporating Information from Reference Genomes with Semi-Supervised Deep Learning Leads to Better Metagenomic Assembled Genomes (Mags)

Metagenomic DNA Sequence Binning Based on Affinity Propagation

Microbiome Single Cell Atlases Generated with a Commercial Instrument

A new metagenome binning method based on gene uniqueness

Hss-Bin: an Unsupervised Metagenomic Binning Method Based on Hybrid Sequence Feature Recognition and Spectral Clustering

Accurate Annotation of Metagenomic Data Without Species-Level References

Application of Relevance Characteristics for the Assignment of Genomic Fragments

CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads

BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation

Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy