Revealing diverse alternative splicing variants of the highly homologous SMN1 and SMN2 genes by targeted long-read sequencing

Mengyao Dai,Yan Xu,Yu Sun,Bing Xiao,Xiaomin Ying,Yu Liu,Wenting Jiang,Jingmin Zhang,Xiaoqing Liu,Xing Ji
DOI: https://doi.org/10.1007/s00438-022-01874-6
Abstract:The survival of motor neuron (SMN) genes, SMN1 and SMN2, are two highly homologous genes related to spinal muscular atrophy (SMA). Different patterns of alternative splicing have been observed in the SMN genes. In this study, the long-read sequencing technique for distinguishing SMN1 and SMN2 without any assembly were developed and applied to reveal multiple alternative splicing patterns and to comprehensively identify transcript variants of the SMN genes. In total, 36 types of transcript variants were identified, with an equal number of variants generated from both SMN1 and SMN2. Of these, 18 were novel SMN transcripts that have never been reported. The structures of SMN transcripts were revealed to be much more complicated and diverse than previously discovered. These novel transcripts were derived from diverse splicing events, including skipping of one or more exons, intron retention, and exon shortening or addition. SMN1 mainly produces FL-SMN1, SMN1Δ7, SMN1Δ5 and SMN1Δ3. The distribution of SMN2 transcripts was significantly different from those of SMN1, with the majority transcripts to be SMN2Δ7, followed by FL-SMN2, SMN2Δ3,5 and SMN2Δ5,7. Targeted long-read sequencing approach could accurately distinguish sequences of SMN1 from those of SMN2. Our study comprehensively addressed naturally occurring SMN1 and SMN2 transcript variants and splicing patterns in peripheral blood mononuclear cells (PBMCs). The novel transcripts identified in our study expanded knowledge of the diversity of transcript variants generated from the SMN genes and showed a much more comprehensive profile of the SMN splicing spectrum. Results in our study will provide valuable information for the study of low expression level of SMN proteins and SMA pathogenesis based on transcript levels.
What problem does this paper attempt to address?