Current Challenges and Solutions of De Novo Assembly

Xingyu Liao,Min Li,You Zou,Fang‐Xiang Wu,Yi Pan,Jianxin Wang
DOI: https://doi.org/10.1007/s40484-019-0166-9
2019-01-01
Quantitative Biology
Abstract:Background Next‐generation sequencing (NGS) technologies have fostered an unprecedented proliferation of high‐throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. However, numerous technical or computational challenges in de novo assembly still remain, although many new ideas and solutions have been suggested to tackle the challenges in both experimental and computational settings. Results In this review, we first briefly introduce some of the major challenges faced by NGS sequence assembly. Then, we analyze the characteristics of various sequencing platforms and their impact on assembly results. After that, we classify de novo assemblers according to their frameworks (overlap graph‐based, de Bruijn graph‐based and string graph‐based), and introduce the characteristics of each assembly tool and their adaptation scene. Next, we introduce in detail the solutions to the main challenges of de novo assembly of next generation sequencing data, single‐cell sequencing data and single molecule sequencing data. At last, we discuss the application of SMS long reads in solving problems encountered in NGS assembly. Conclusions This review not only gives an overview of the latest methods and developments in assembly algorithms, but also provides guidelines to determine the optimal assembly algorithm for a given input sequencing data type.
What problem does this paper attempt to address?