A Fully Integrated End-to-End Genome Analysis Accelerator for Next-Generation Sequencing

Nian-Shyang Chang,Chao-Hsi Lee,Chung-Hsuan Yang,Yi-Chung Wu,Wen-Ching Chen,Liang-Yi Lin,Yen-Lung Chen,Chun-Pin Lin,Chi-Shi Chen,Jui-Hung Hung,Chia-Hsiang Yang
DOI: https://doi.org/10.1109/ISSCC42615.2023.10067532
2023-02-19
Abstract:Next-generation sequencing (NGS) has revolutionized biological sciences and clinical practices. It has become an essential technology for various emerging applications, such as cancer screening and inherited disease diagnosis. Fig. 2.4.1 shows an overview of an NGS pipeline. An NGS pipeline includes sample preparation, sequencing, data analysis and tertiary analysis. A sequencer first generates a massive amount of DNA segments (short-reads) from samples. Short-reads are used as the inputs for data analysis. The outputs (genetic variants) of the data analysis can then be sent to facilities for further tertiary analysis. The data analysis is very time consuming and has become the bottleneck in the entire NGS pipeline [1]. The high computational complexity comes from hundreds of millions of short-reads for reconstructing a DNA sequence with three billion nucleotides. A complete data analysis workflow includes four steps: short-read mapping, haplotype calling, variant calling and genotyping. Data analysis accelerators have been proposed to reduce the processing time [2] [3]. They support the first three steps of the workflow, but genotyping, the dominant step [4], is not supported. Additionally, only the single-end-based short-read mapping is adopted in previous works so that the achieved analysis accuracy is limited. This work presents a fully integrated data analysis accelerator that handles the complete analysis workflow. Mapping with paired-end short-reads along with rescue is utilized to enhance the analysis accuracy.
Computer Science,Engineering,Biology
What problem does this paper attempt to address?