Comparing the predictors of mutability among healthy human tissues inferred from mutations in single cell genome data

Madeleine Oman,Rob W. Ness
DOI: https://doi.org/10.1101/2023.11.28.569048
2024-05-01
Abstract:Studying mutation in healthy somatic tissues is key for understanding the genesis of cancer and other genetic diseases. Mutation rate varies from site to site in the human genome by up to 100-fold and is influenced by numerous epigenetic and genetic factors including GC content, trinucleotide sequence context, and DNAse accessibility. These factors influence mutation at both local and regional scales and are often interrelated with one another, meaning that predicting mutability or uncovering its drivers requires modelling multiple factors and scales simultaneously. Historically, most investigations have focused either on analyzing the local sequence scale through triplet signatures or on examining the impact of epigenetic processes at larger scales, but not both concurrently. Additionally, sequencing technology limitations have restricted analyses of healthy mutations to coding regions (RNA-seq) or to those that have been influenced by selection (e.g. bulk samples from cancer tissue). Here we leverage single cell mutations and present a comprehensive analysis of epigenetic and genetic factors at multiple scales in the germline and three healthy somatic tissues. We create models that predict mutability with on average 2% error, and find up to 63-fold variation among sites within the same tissue. We observe varying degrees of similarity between tissues: the mutability of genomic positions was 93.4% similar between liver and germline tissues, but sites in germline and skin were only 85.9% similar. We observe both universal and tissue-specific mutagenic processes in healthy tissues, with implications for understanding the maintenance of germline versus soma and the mechanisms underlying early tumorigenesis.
Genetics
What problem does this paper attempt to address?
This paper mainly discusses the predictive factors of genetic mutation variability in healthy human tissues. The researchers used single-cell genomic data to analyze the effects of genetic and epigenetic factors on reproductive cells and three types of healthy somatic tissues (blood, liver, and skin). Their model was able to predict mutability with an average error of 2% and found up to a 63-fold variation within the same tissue. The study revealed both common and specific mutation patterns between different tissues, which is important for understanding the mechanisms of cancer and other genetic diseases, as well as maintaining the differences between germ cells and somatic cells. The study mentioned that CpG dinucleotides in gene sequences are highly prone to mutations, while regions near replication origins have lower mutation rates. Additionally, transcript levels, chromatin conformation, various histone marks, and recombination, among other epigenetic processes, are also associated with variation in mutation rates. Although these factors are interconnected, most previous studies either focused on local sequence scales or only considered larger-scale epigenetic processes. Using a multivariate regression model, the researchers were able to simultaneously analyze multiple genetic and epigenetic predictive factors at different tissue and fine resolutions. The results showed that the model accurately predicted changes in mutation rates in all four tissues, particularly in blood, liver, reproductive cells, and skin tissue. The study also found that while CpG sites were the main driver of mutations in all tissues, the model still maintained low error rates in the absence of these sites. Furthermore, the study revealed differences in the factors influencing mutations between different tissues, such as the significant impact of repetitive sequences on mutations in skin tissue, which was not evident in other tissues. In conclusion, this study provides new insights into the variability of genetic mutations in healthy tissues, emphasizing the need to consider multiple genetic and epigenetic factors comprehensively to understand tissue-specific and common mutation patterns. This has important implications for studying early cancer development and other genetic diseases.