Abstract B008: A computational workflow for determining genetic ancestry using cell lines: Challenges and solutions

Matthew S. Chang,Kalyanee Shirlekar,Katherine A. Martinez,Chayil C. Lattimore,Kimberly J. Newsom,Jason O. Brant,Kristianna M. Fredenburg
DOI: https://doi.org/10.1158/1538-7755.disp24-b008
2024-09-23
Cancer Epidemiology Biomarkers & Prevention
Abstract:Background: Self-reported race has been described as an imperfect variable for understanding the genetic underpinnings of cancer-related disease processes. Conversely, genetic ancestry characterization provides a more reliable alignment of ancestral-related disease markers with cancer health disparities disease outcomes. Here, we outline our computational workflow for inferring genetic ancestry using whole genome sequencing (WGS) and RNA-sequencing (RNA-seq) data derived from cancer cell lines and furthermore, describe our challenges and solutions in workflow development. Methods: Total DNA and RNA extracted from four laryngeal squamous cell carcinoma lines (two derived from self-reported Black patients; two derived from self-reported White patients) was used to generate WGS and RNA-seq libraries, respectively. Libraries were sequenced and Illumina DRAGEN pipelines were used to call variants. Data from the phase III 1000 Genomes (1KG) Project was used for ancestry inference with the designations: African (AFR), African American in Southwest USA (ASW), East Asian (EAS), European (EUR), South Asian (SAS). Genotyping data for the 1KG samples and cell lines were merged with BCFtools and then filtered variants for minor allele frequency and linkage disequilibrium. Principal component analyses using 305 ancestry informative markers (AIMs) were performed with PLINK to visualize the clustering of samples based on shared genetic ancestry. The genetic ancestry proportions for all samples were calculated with ADMIXTURE using all filtered variants. Results: From our merged datasets, of the 305 AIMs, 264 AIMs were identified in the WGS dataset and 139 AIMs were identified in the RNA-seq dataset. AIMs from both the WGS and RNA-seq data enabled ancestral clustering and correct alignment with self- reported race where Black patient-derived cell lines clustered closest with ASW and AFR individuals and White patient-derived cell lines closest with EUR individuals. AIMs from the WGS dataset provided a better separation of superpopulations compared with AIMs identified by the RNA-seq dataset, in particular in the separation of individuals of EUR and SAS ancestry. We experienced challenges when merging sequencing data from cell lines with publicly available 1KG data. We resolved this by using the Illumina DRAGEN pipelines to produce genotype assignments for variants. Determining the appropriate filtering parameters for minor allele frequency and linkage disequilibrium was also a challenge. We resolved this by adjusting the stringency of filtering parameters which allowed us to retain a reasonable number of variants for ancestry inference at the superpopulation level. Conclusions: In summary, we successfully developed a computational workflow which enables inference of genetic ancestry within patient-derived cell lines. We observed within workflow development that WGS data is superior to RNA-seq data in clustering superpopulations. This may be related to increased number of AIMs identified within WGS dataset, enabling more distinct clustering. Citation Format: Matthew S. Chang, Kalyanee Shirlekar, Katherine A. Martinez, Chayil C. Lattimore, Kimberly J. Newsom, Jason O. Brant, Kristianna M. Fredenburg. A computational workflow for determining genetic ancestry using cell lines: Challenges and solutions [abstract]. In: Proceedings of the 17th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2024 Sep 21-24; Los Angeles, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2024;33(9 Suppl) nr B008.
oncology,public, environmental & occupational health
What problem does this paper attempt to address?