Systematically developing a registry of splice-site creating variants utilizing massive publicly available transcriptome sequence data

Naoko Iida,Ai Okada,Yoshihisa Kobayashi,Kenichi Chiba,Yasushi Yatabe,Yuichi Shiraishi
DOI: https://doi.org/10.1101/2024.02.21.581470
2024-02-23
Abstract:Genomic variants causing abnormal splicing play an important role in genetic disorders and cancer development. Among them, variants that cause formations of novel splice-sites (splice-site creating variants, SSCVs) are particularly difficult to identify and often overlooked in genomic studies. Additionally, these SSCVs, especially those found in deep intronic regions, are frequently considered promising candidates for treatment with splice-switching antisense oligonucleotides (ASOs), offering therapeutic potential for rare disease patients. To leverage massive transcriptome sequence data such as those available from the Sequence Read Archive, we developed a novel framework to screen for SSCVs solely using transcriptome data. We have applied it to 322,072 publicly available transcriptomes and identified 30,130 SSCVs. Utilizing this extensive collection of SSCVs, we have revealed the characteristics of Alu exonization via SSCVs, especially the hotspots of SSCVs within Alu sequences and their evolutionary relationships. Many of the SSCVs affecting disease-causing variants were predicted to generate premature termination codons and are degraded by nonsense-mediated decay. On the other hand, several genes, such as and showed characteristic SSCV profiles indicative of heterogeneous mutational functions beyond simple loss-of-function. Finally, we discovered novel gain-of-function SSCVs in the deep intronic region of the gene and demonstrated that their activation can be suppressed using splice-switching ASOs. Collectively, we provide a systematic approach for automatically acquiring a registry of SSCVs, which can be used for elucidating novel biological mechanisms for splicing and genetic variation, and become a valuable resource for pinpointing critical targets in drug discovery. Catalogs of SSCVs identified in this study are accessible on SSCV DB ( ).
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to identify and catalog novel splice - site - creating variants (SSCVs) caused by genomic variations. Specifically, these variations lead to the formation of new splice - donor or - acceptor sites, which in turn cause abnormal splicing. Such variations play an important role in genetic diseases and cancer development, but these variations are often difficult to identify in traditional genomic studies, especially SSCVs formed in deep intronic regions. The main contribution of the paper lies in developing a new framework to screen SSCVs using only transcriptome data. The authors utilized a large amount of publicly available transcriptome data from the Sequence Read Archive (SRA) and analyzed 322,072 publicly available transcriptomes, ultimately identifying 30,130 SSCVs. Through this extensive collection of SSCVs, the authors revealed the characteristics of Alu exonization via SSCVs, especially the SSCVs hotspots in Alu sequences and their evolutionary relationships. In addition, many disease - affecting SSCVs are predicted to generate premature stop codons and are degraded by nonsense - mediated decay (NMD). On the other hand, certain genes such as CREBBP and TP53 show specific SSCVs profiles, indicating that the heterogeneity of their mutational functions goes beyond a simple inactivation effect. Finally, the authors found new functional SSCVs in the deep intronic region of the NOTCH1 gene and demonstrated that the activation of these SSCVs can be inhibited by splice - switching antisense oligonucleotides (ASOs). Overall, this study provides a systematic method for automatically obtaining a register of SSCVs, which helps to reveal new biological mechanisms of splicing and gene variation and becomes a valuable resource for identifying key targets in drug discovery.