A Dirichlet-multinomial mixed model for determining differential abundance of mutational signatures

Lena Morrill Gavarró,Dominique-Laurent Couturier,Florian Markowetz
DOI: https://doi.org/10.1101/2024.03.07.583837
2024-03-08
Abstract:Mutational processes of diverse origin leave their imprints in the genome during tumour evolution. These imprints are called and they have been characterised for point mutations, structural variants and copy number changes. Each signature has an , or abundance, per sample, which indicates how much a process has contributed to the overall genomic change. Mutational processes are not static, and a better understanding of their dynamics is key to characterise tumour evolution and identify cancer weaknesses that can be exploited during treatment. However, the structure of the data typically collected in this context makes it difficult to test whether signature exposures differ between samples or time-points. In general, the data consist of (1) patient-dependent vectors of counts for each sample and clonality group (2) generated from a covariate-dependent and compositional vector of probabilities with (3) a possibly group-dependent over-dispersion level. To model these data, we build on the Dirichlet-multinomial model to be able to model multivariate overdispersed vectors of counts as well as within-sample dependence and positive correlations between signatures. To estimate the model parameters, we implement a maximum likelihood estimator with a Laplace approximation of the random effect high-dimensional integrals and assess its bias and coverage by means of Monte Carlo simulations. We apply our approach to characterise differences of mutational processes between clonal and subclonal mutations across 23 cancer types of the PCAWG cohort. We find ubiquitous differential abundance of clonal and subclonal signatures across cancer types, and higher dispersion of signatures in the subclonal group, indicating higher variability between patients at subclonal level, possibly due to the presence of different clones with distinct active mutational processes. Mutational signature analysis is an expanding field and we envision our framework to be used widely to detect global changes in mutational process activity.
Cancer Biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the differential abundance of mutational signatures between clonal and subclonal mutations during cancer evolution. Specifically, the authors developed a method based on the Dirichlet - multinomial mixed model to detect whether the exposure of mutational signatures differs between different samples or time points. This method aims to overcome the challenges posed by the existing data structure. These data usually include patient - dependent count vectors for each sample and clonal population, which are generated from covariate - dependent composition probability vectors with a possible over - dispersion level between groups. Through this method, researchers hope to better understand the dynamics of the mutational process, thereby helping to characterize tumor evolution and identify cancer weaknesses for exploitation in treatment. In short, the core problem of this study is to determine, through statistical tests, whether there are significant differences in the relative strength of mutational signatures between clonal and subclonal mutations, and how these differences affect the analysis of different cancer types. This is not only crucial for understanding the development mechanism of tumors, but also has potential application value for future cancer treatment strategies.