Next-generation sequencing analysis with a population-specific reference genome

Tomohisa Suzuki,Kota Ninomiya,Takamitsu Funayama,Yasunobu Okamura,Shu Tadaka,the Tohoku Medical Megabank Project Study Group,Kengo Kinoshita,Masayuki Yamamoto,Shigeo Kure,Atsuo Kikuchi,Gen Tamiya,Jun Takayama
DOI: https://doi.org/10.1101/2024.03.07.584017
2024-03-10
Abstract:Next-generation sequencing (NGS) has become widely available and is routinely used in basic research and clinical practice. The reference genome sequence is an essential resource for NGS analysis, and several population-specific reference genomes have recently been constructed to provide a choice to deal with the vast genetic diversity of human samples. However, resources supporting population-specific references are insufficient, and it is burdensome to perform analysis using these reference genomes. Here, we constructed a set of resources to support NGS analysis using the Japanese reference genome sequence, JG. We created resources for variant calling, gene and repeat element annotations, variant-effect prediction, read mappability, and RNA-seq analysis. We also provide a resource for reference coordinate conversion for further annotation enrichment. We then provide a variant calling protocol using JG-based resources. Our resources provide a guide to prepare sufficient resources for the use of population-specific reference genomes and can facilitate the migration of reference genomes.
Bioinformatics
What problem does this paper attempt to address?
The problem addressed in this paper is the limitation of existing reference genomes (such as GRCh37 and GRCh38) in genomic analysis, especially in handling genetic diversity. The researchers constructed a set of resources to support the use of the Japanese-specific reference genome JG for next-generation sequencing (NGS) analysis. The paper aims to provide tools and databases for variant detection, gene and repeat element annotation, variant effect prediction, etc. It also proposes a variant calling protocol using JG to improve the genomic analysis environment for specific populations.