A Genome-Wide Search for Signals of High-Altitude Adaptation in Tibetans.

Shuhua Xu,Shilin Li,Yajun Yang,Jingze Tan,Haiyi Lou,Wenfei Jin,Ling Yang,Xuedong Pan,Jiucun Wang,Yiping Shen,Bailin Wu,Hongyan Wang,Li Jin
DOI: https://doi.org/10.1093/molbev/msq277
IF: 10.7
2010-01-01
Molecular Biology and Evolution
Abstract:Genetic studies of Tibetans, an ethnic group with a long-lasting presence on the Tibetan Plateau which is known as the highest plateau in the world, may offer a unique opportunity to understand the biological adaptations of human beings to high-altitude environments. We conducted a genome-wide study of 1,000,000 genetic variants in 46 Tibetans (TBN) and 92 Han Chinese (HAN) for identifying the signals of high-altitude adaptations (HAAs) in Tibetan genomes. We discovered the most differentiated variants between TBN and HAN at chromosome 1q42.2 and 2p21. EGLN1 (or HIFPH2, MIM 606425) and EPAS1 (or HIF2A, MIM 603349), both related to hypoxia-inducible factor, were found most differentiated in the two regions, respectively. Strong positive correlations were also observed between the frequency of TBN-dominant haplotypes in the two gene regions and altitude in East Asian populations. Linkage disequilibrium and further haplotype network analyses of world-wide populations suggested the antiquity of the TBN-dominant haplotypes and long-term persistence of the natural selection. Finally, a "dominant haplotype carrier" hypothesis could describe the role of the two genes in HAA. All of our population genomic and statistical analyses indicate that EPAS1 and EGLN1 are most likely responsible for HAA of Tibetans. Interestingly, one each but not both of the two genes were also identified by three recent studies. We reanalyzed the available data and found the escaped top signal (EPAS1) could be recaptured with data quality control and our approaches. Based on this experience, we call for more attention to be paid to controlling data quality and batch effects introduced in public data integration. Our results also suggest limitations of extended haplotype homozygosity-based method due to its compromised power in case the natural selection initiated long time ago and particularly in genomic regions with recombination hotspots.
What problem does this paper attempt to address?