Analysis of Five Deep-sequenced Trio-genomes of the Peninsular Malaysia Orang Asli and North Borneo Populations
Lian Deng,Haiyi Lou,Xiaoxi Zhang,Bhooma Thiruvahindrapuram,Dongsheng Lu,Christian R. Marshall,Chang Liu,Bo Xie,Wanxing Xu,Lai-Ping Wong,Chee-Wei Yew,Aghakhanian Farhang,Rick Twee-Hee Ong,Mohammad Zahirul Hoque,Abdul Rahman Thuhairah,Bhak Jong,Maude E. Phipps,Stephen W. Scherer,Yik-Ying Teo,Subbiah Vijay Kumar,Boon-Peng Hoh,Shuhua Xu
DOI: https://doi.org/10.1186/s12864-019-6226-8
IF: 4.547
2019-01-01
BMC Genomics
Abstract:Abstract Background Recent advances in genomic technologies have facilitated genome-wide investigation of human genetic variations. However, most efforts have focused on the major populations, yet trio genomes of indigenous populations from Southeast Asia have been under-investigated. Results We analyzed the whole-genome deep sequencing data (~30×) of five native trios from Malaysia, and discovered approximately 6.9 million single nucleotide variants (SNVs), 1.2 million small insertions and deletions (indels), and 9,000 copy number variants (CNVs) in the 15 samples. We found 2.7% SNVs, 2.3% indels and 22% CNVs were novel, implying the insufficient coverage of population diversity in existing databases. We identified a higher proportion of novel variants in the Orang Asli (OA) samples, i.e., the indigenous people from Peninsular Malaysia, than that of the North Bornean (NB) samples, likely due to more complex demographic history and long-time isolation of the OA groups. We used the pedigree information to identify autosomal de novo variants and estimated the mutation rates to be 0.81×10-8–1.33×10-8 , 1.0×10-9–2.9×10-9, and ~0.001 per site per generation for SNVs, indels, and CNVs, respectively. The trio-genomes also allowed for accurate haplotype phasing with high accuracy, which serves as references to the future genomic studies of OA and NB populations. In addition, high-frequency inherited CNVs specific to OA or NB were identified. One example was a 50-kb duplication in DEFA1B detected only in the Negrito trios, implying plausible effects on host defense against the exposure of diverse microbial in tropical rainforest environment of these hunter-gatherers. The CNVs shared between OA and NB groups were much fewer than those specific to each group. Nevertheless, we identified a 142-kb duplication in AMY1A in all the 15 samples, and this gene is associated with the high-starch diet. Moreover, novel insertions shared with archaic hominids were identified in our samples. Conclusion Our study presents a full catalogue of the genome variants of the native Malaysian populations, which is a complement of the genome diversity in Southeast Asians. It implies specific population history of the native inhabitants, and demonstrated the necessity of more genome sequencing efforts on the multi-ethnic native groups of Malaysia and Southeast Asia.
What problem does this paper attempt to address?