AI-SNPs Screening Based on the Whole Genome Data and Research on Genetic Structure Differences of Subcontinent Populations.

Hao-Yu Wang,Yu-Han Hu,Yue-Yan Cao,Qiang Zhu,Yu-Guo Huang,Xi Li,Ji Zhang
DOI: https://doi.org/10.16288/j.yczz.21-185
2021-01-01
Abstract:The genetic structure differences in population is one of the key elements in medical research involving multi-population samples. A set of ancestry-informative single nucleotide polymorphisms (AI-SNPs) can be utilized to analyze genetic component of a population, infer ancestral origin of individuals and pre-filter samples to reduce the impact of population genetic structure differences on medical research. However, most of the published studies were focused on revealing the differences between populations of continents or regions of a continent. In this paper, AI-SNPs were screened by calculating FST value in each pair of five East Asian populations: Japanese in Tokyo (JPT), Han Chinese in Beijing (CHB), Southern Han Chinese (CHS), Chinese Dai in Xishuangbanna (CDX) and Kinh in Ho Chi Minh City (KHV) in the 1000 Genomes Project phase 3 (GRCh37.p13) to analyze differences in subcontinent populations. The results demonstrate that the five East Asian populations in our study were assigned to three clusters: JPT, CHB and CHS, CDX and KHV. A set of AI-SNPs can be used for analysis of individual genetic composition and selection of representative individuals. Individuals with over 80% population representative genetic components have good representativeness of a population. This paper demonstrated the practical value of the method, which was performed to verify the ancestral composition and select representative samples with a panel of screened AI-SNPs by FST value, thereby reducing the influence of genetic structure differences in subcontinent populations on population-related medical research.
What problem does this paper attempt to address?