Machine learning predictions improve identification of real-world cancer driver mutations
Thinh N. Tran,Chris Fong,Karl Pichotta,Anisha Luthra,Ronglai Shen,Yuan Chen,Michele Waters,Susie Kim,Michael F Berger,Gregory Riely,Marc Ladanyi,Debyani Chakravarty,Nikolaus Schultz,Justin Jee
DOI: https://doi.org/10.1101/2024.03.31.587410
2024-04-05
Abstract:Characterizing and validating which mutations influence development of cancer is challenging. Machine learning has delivered significant advances in protein structure prediction, but its utility for identifying cancer drivers is less explored. We evaluated multiple computational methods for identifying cancer driver alterations. For identifying known drivers, methods incorporating protein structure or functional genomic data outperformed methods trained only on evolutionary data. We further validated VUSs annotated as pathogenic by testing their association with overall survival in two cohorts of patients with non-small cell lung cancer (N=7,965 and 977). “Pathogenic” VUSs in and identified by several methods were associated with worse survival, unlike “benign” VUSs. “Pathogenic” VUSs exhibited mutual exclusivity with known oncogenic alterations at the pathway level, further suggesting biological validity. Despite training primarily on germline, rather than somatic, mutation data, computational predictions contribute to a more comprehensive understanding of tumor genetics as validated by real-world data.
Genomics