Binning Dna Fragment Of Metagenome Using A Novel Model

hou tao,liu yun,liu fu,wang ke,xie jian
DOI: https://doi.org/10.1109/ccdc.2015.7162767
2015-01-01
Abstract:An essential task addressed in the metagenomics data analysis is to predict the organism of each DNA fragment from a sequenced metagenome, which can aid in linking gene functions to members of the community or estimate the microbial abundance of the studied sample. Some classifiers have been developed to assess the source organism of DNA fragments from metagenome. However, the majority of existing classifiers usually suffer from the lower classification accuracy at genus level. One of the major reasons is they cannot discriminate the training data from different taxonomic classes accurately, when the training data contain some outliers. However, the training genomic data (bacterial and archaeal genomes) usually contain a portion of outliers, which come from sequencing errors, phage invasions and some highly expressed genes, etc. The outliers, treated as noises prohibit the development of classifiers with a better performance. To overcome the difficulty, we presented a strategy based on support vector data description (SVDD) model, which can enhance the discriminating ability of the classifier by giving up some outliers in training genomic data. Experiments were performanced on simulated and real metagenomes. The results demonstrate that our classifier has high classification sensitivity, specificity and accuracy as well as low false negative rate.
What problem does this paper attempt to address?