Abstract 789: GrafGen: Distance-Based Inference of Population Ancestry for Helicobacter pylori Genomes

William Wheeler,Difei Wang,Isaac Zhao,Yumi Jin,Charles S. Rabkin
DOI: https://doi.org/10.1158/1538-7445.am2024-789
IF: 11.2
2024-03-22
Cancer Research
Abstract:Abstract BACKGROUND: The 1.67 megabase H. pylori genome contains ~143,000 biallelic single-nucleotide polymorphisms (SNP) with minor allele frequency > 1%. Confounding by population stratification is a major source of bias requiring adjustment in genome-wide association studies (GWAS). Previous model- and distance-based methods for bacterial genomes yield varying results depending upon which strains are included for comparison in analyses. We therefore developed a robust classification of H. pylori ancestry to facilitate generalizable inferences about disease associations. METHODS: GrafGen is an R software package adapted from the GrafPop tool for human ancestry (Jin et al., 2017; Jin et al., 2019). The underlying classification algorithm compares SNPs of an individual genome with frequencies in reference populations to estimate subject ancestry and ancestral proportions based on calculated genetic distances. Outputs incorporate visualization tools that provide natural geometric interpretation of population structure. RESULTS: Training data were obtained from the H. pylori Genome Project (HpGP), a global survey of 1011 H. pylori genomes collected across 51 countries (Thorell et al., 2023). Based on genetic distances, HpGP sequences clustered into nine mutually exclusive populations designated by their predominant geographic source. Previously published population assignments for a test set of 255 sequences obtained from GenBank mapped to specific GrafGen classifications (Table). GrafGen assignments based on randomly selected sets of 14,300 and 1430 SNPs were >97% and >90% identical, respectively, to those based on all 143,000 SNPs. CONCLUSIONS: GrafGen's universal categorization of H. pylori ancestry has utility for bacterial GWAS. The software code is publicly available for research on this important pathogen. Theoretically, the same algorithm can be implemented to infer ancestry of any single chromosome haploid species that has sufficient sequence data for references. Citation Format: William Wheeler, Difei Wang, Isaac Zhao, Yumi Jin, Charles S. Rabkin. GrafGen: Distance-Based Inference of Population Ancestry for Helicobacter pylori Genomes [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 789.
oncology
What problem does this paper attempt to address?