Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing

Lai-Ping Wong,Jason Kuan-Han Lai,Woei-Yuh Saw,Rick Twee-Hee Ong,Anthony Youzhi Cheng,Nisha Esakimuthu Pillai,Xuanyao Liu,Wenting Xu,Peng Chen,Jia-Nee Foo,Linda Wei-Lin Tan,Seok-Hwee Koo,Richie Soong,Markus Rene Wenk,Wei-Yen Lim,Chiea-Chuen Khor,Peter Little,Kee-Seng Chia,Yik-Ying Teo
DOI: https://doi.org/10.1371/journal.pgen.1004377
2014-05-15
Abstract:South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP). The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP). SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal) identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.
What problem does this paper attempt to address?