Identifying somatic fingerprints of cancers defined by germline and environmental risk factors

Saptarshi Chakraborty,Zoe Guan,Caroline E. Kostrzewa,Ronglai Shen,Colin B. Begg
DOI: https://doi.org/10.1002/gepi.22565
2024-05-02
Genetic Epidemiology
Abstract:Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high‐throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline‐somatic relationships in an interpretable manner. The method uses meta‐features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower‐dimensional embeddings of the meta‐features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole‐exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline BRCA1/2 mutations and in head and neck cancer patients exposed to human papillomavirus.
genetics & heredity,mathematical & computational biology
What problem does this paper attempt to address?