Genome-wide association studies combined with <i>k</i>-fold cross-validation identify rs17822931 as an ancestry-informative marker in Han Chinese population

Zheng Li,Jiayi Wu,Jiawen Yang,Kai Li,Ji Chen,Shuainan Huang,Qiang Ji,Xiaochao Kong,Sumei Xie,Wenxuan Zhan,Beilei Zhang,Ke Ye,Qingfan Liu,Zhengsheng Mao,Yue Cao,Huijie Huang,Youjia Yu,Kang Wang,Yanfang Yu,Ding Li,Feng Chen,Peng Chen
DOI: https://doi.org/10.1002/elps.202200227
2023-01-01
Electrophoresis
Abstract:DNA-based ancestry inference has long been a research hot spot in forensic science. The differentiation of Han Chinese population, such as the northern-to-southern substructure, would benefit forensic practice. In the present study, we enrolled participants from northern and southern China, each participant was genotyped at similar to 400 K single-nucleotide polymorphisms (SNPs) and data of CHB and CHS from 1000 Genomes Project were used to perform genome-wide association analyses. Meanwhile, a new method combining genome-wide association study (GWAS) analyses with k-fold cross-validation in a small sample size was introduced. As a result, one SNP rs17822931 emerged with a p-value of 7.51E - 6. We also simulated a huge dataset to verify whether k-fold cross-validation could reduce the false-negative rate of GWAS. The identified ABCC11 rs17822931 has been reported to have allele frequencies varied with the geographical gradient distribution in humans. We also found a great difference in the allele frequency distributions of rs17822931 among five different cohorts of the Chinese population. In conclusion, our study demonstrated that even small-scale GWAS can also have potential to identify effective loci with implemented k-fold cross-validation method and shed light on the potential maker of rs17822931 in differentiating the north-to-south substructure of the Han Chinese population.
What problem does this paper attempt to address?