MEC: Misassembly Error Correction in Contigs Using a Combination of Paired-End Reads and GC-contents

Binbin Wu,Jianxin Wang,Junwei Luo,Min Li,Fangxiang Wu,Yi Pan
DOI: https://doi.org/10.1109/bibm.2017.8217652
2017-01-01
Abstract:The de novo assembly aims to reconstruct the genome of the unknown species. Many algorithms have been proposed for de novo assemblies. Due to problems of repetitive regions and sequencing errors, contigs usually contain a large amount of misassemblies. Consequently, the misassembly correction of contigs is a challenging and significant work, which receives considerable attentions from researchers. In this study, we propose a novel method, called MEC, to identify and correct misassemblies in contigs. Firstly, MEC takes fragment coverage as the feature to detect the candidate misassemblies. Then, it can distinguish a large number of false positives from the candidate misassemblies based on the distribution of paired-end reads and the statistical analysis of GC-contents. We apply MEC to four real contig datasets, and carry out experiments to analyze the influence of MEC on scaffolding results, which shows that MEC can reduce misassemblies effectively and result in quantitative improvements in scaffolding quality. MEC is publicly available for download at https://github.com/bioinfomaticsCSU/MEC.
What problem does this paper attempt to address?