Somatic mutation and selection at epidemiological scale

Andrew R.J. Lawson,Federico Abascal,Pantelis A. Nicola,Stefanie V. Lensing,Amy L. Roberts,Georgios Kalantzis,Adrian Baez-Ortega,Natalia Brzozowska,Julia S. El-Sayed Moustafa,Dovile Vaitkute,Belma Jakupovic,Ayrun Nessa,Samuel Wadge,Anna L. Paterson,Doris M. Rassl,Raul E. Alcantara,Laura O'Neill,Sara Widaa,Siobhan Austin-Guest,Matthew D.C. Neville,Moritz J. Przybilla,Wei Cheng,Maria Morra,Lucy Sykes,Matthew Mayho,Nicole Muller-Sienerth,Nick Williams,Diana Alexander,Luke M.R. Harvey,Thomas Clarke,Alex Byrne,Jamie R. Blundell,Matthew D. Young,Krishnaa T.A. Mahbubani,Kourosh Saeb-Parsy,Hilary C. Martin,Michael R. Stratton,Peter J. Campbell,Raheleh Rahbari,Kerrin S. Small,Inigo Martincorena
DOI: https://doi.org/10.1101/2024.10.30.24316422
2024-11-01
Abstract:As we age, many tissues become colonised by microscopic clones carrying somatic driver mutations. Some of these clones represent a first step towards cancer whereas others may contribute to ageing and other diseases. However, our understanding of the clonal landscapes of human tissues, and their impact on cancer risk, ageing and disease, remains limited due to the challenge of detecting somatic mutations present in small numbers of cells. Here, we introduce a new version of nanorate sequencing (NanoSeq), a duplex sequencing method with error rates <5 errors per billion base pairs, which is compatible with whole-exome and targeted gene sequencing. Deep sequencing of polyclonal samples with single-molecule sensitivity enables the simultaneous detection of mutations in large numbers of clones, yielding accurate somatic mutation rates, mutational signatures and driver mutation frequencies in any tissue. Applying targeted NanoSeq to 1,042 non-invasive samples of oral epithelium and 371 samples of blood from a twin cohort, we found an unprecedentedly rich landscape of selection, with 49 genes under positive selection driving clonal expansions in the oral epithelium, over 62,000 driver mutations, and evidence of negative selection in some genes. The high number of positively selected mutations in multiple genes provides high-resolution maps of selection across coding and non-coding sites, a form of in vivo saturation mutagenesis. Multivariate regression models enable mutational epidemiology studies on how carcinogenic exposures and cancer risk factors, such as age, tobacco or alcohol, alter the acquisition and selection of somatic mutations. Accurate single-molecule sequencing has the potential to unveil the polyclonal landscape of any tissue, providing a powerful tool to study early carcinogenesis, cancer prevention and the role of somatic mutations in ageing and disease.
Genetic and Genomic Medicine
What problem does this paper attempt to address?