Abstract:Background: The underrepresentation of human genomic resources from Southern Chinese populations limited their health equality in the precision medicine era and complete understanding of their genetic formation, admixture, and adaptive features. Besides, linguistical and genetic evidence supported the controversial hypothesis of their origin processes. One hotspot case was from the Chinese Guangxi Pinghua Han people (GPH), whose language was significantly similar to Southern Chinese dialects but whose uniparental gene pool was phylogenetically associated with the indigenous Tai-Kadai (TK) people. Here, we analyzed genome-wide SNP data in 619 people from four language families and 56 geographically different populations, in which 261 people from 21 geographically distinct populations were first reported here. Results: We identified significant population stratification among ethnolinguistically diverse Guangxi populations, suggesting their differentiated genetic origin and admixture processes. GPH shared more alleles related to Zhuang than Southern Han Chinese but received more northern ancestry relative to Zhuang. Admixture models and estimates of genetic distances showed that GPH had a close genetic relationship with geographically close TK compared to Northern Han Chinese, supporting their admixture origin hypothesis. Further admixture time and demographic history reconstruction supported GPH was formed via admixture between Northern Han Chinese and Southern TK people. We identified robust signatures associated with lipid metabolisms, such as fatty acid desaturases (FADS) and medically relevant loci associated with Mendelian disorder (GJB2) and complex diseases. We also explored the shared and unique selection signatures of ethnically different but linguistically related Guangxi lineages and found some shared signals related to immune and malaria resistance. Conclusions: Our genetic analysis illuminated the language-related fine-scale genetic structure and provided robust genetic evidence to support the admixture hypothesis that can explain the pattern of observed genetic diversity and formation of GPH. This work presented one comprehensive analysis focused on the population history and demographical adaptative process, which provided genetic evidence for personal health management and disease risk prediction models from Guangxi people. Further large-scale whole-genome sequencing projects would provide the entire landscape of southern Chinese genomic diversity and their contributions to human health and disease traits.

Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing

Large-scale whole-genome sequencing of three diverse Asian populations in Singapore

Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

Analysis of clinically relevant variants from ancestrally diverse Asian genomes

A Catalogue of Structural Variation across Ancestrally Diverse Asian Genomes

Natural Positive Selection and North–south Genetic Diversity in East Asia

Extremely low-coverage whole genome sequencing in South Asians captures population genomics information

Mapping Human Genetic Diversity in Asia.

Analysis of Five Deep-sequenced Trio-genomes of the Peninsular Malaysia Orang Asli and North Borneo Populations

50,000 years of Evolutionary History of India: Insights from ∼2,700 Whole Genome Sequences

Characterising Private and Shared Signatures of Positive Selection in 37 Asian Populations.

Single-cell analysis of human diversity in circulating immune cells

Massively parallel sequencing of 165 ancestry informative SNPs in two Chinese Tibetan-Burmese minority ethnicities

Differentiated adaptative genetic architecture and language-related demographical history in South China inferred from 619 genomes from 56 populations

Dissecting the genetic structure and admixture of four geographical Malay populations

SgD-CNV, a database for common and rare copy number variants in three Asian populations.

Complete genomic profiles of 1496 Taiwanese reveal curated medical insights

PGG.SNV: Understanding the Evolutionary and Medical Implications of Human Single Nucleotide Variations in Diverse Populations

SEAD: an augmented reference panel with 22,134 haplotypes boosts the rare variants imputation and GWAS analysis in Asian population

The Diploid Genome Sequence of an Asian Individual

Analysis of East Asia genetic substructure using genome-wide SNP arrays