Impact of U2-type introns on splice site prediction in Arabidopsis thaliana using deep learning

Espoir Kabanga,Soeun Yun,Arnout Van Messem,Wesley De Neve
DOI: https://doi.org/10.1101/2024.05.13.593811
2024-05-14
Abstract:In this study, we investigate the impact of introns on the effectiveness of splice site prediction using deep learning models, focusing on . We specifically utilize U2-type introns due to their ubiquity in plant genomes and the rich datasets available. We formulate two hypotheses: first, that short introns would lead to a higher effectiveness of splice site prediction than long introns due to reduced spatial complexity; and second, that sequences containing multiple introns would improve prediction effectiveness by providing a richer context for splicing events. Our findings indicate that (1) models trained on datasets with shorter introns consistently outperform those trained on datasets with longer introns, highlighting the importance of intron length in splice site prediction, and (2) models trained with datasets containing multiple introns per sequence demonstrate superior effectiveness over those trained with datasets containing a single intron per sequence. Furthermore, our findings not only align with the two hypotheses we put forward but also confirm existing observations from wet lab experiments regarding the impact of length of an intron and the number of introns present in a sequence on splice site prediction effectiveness, suggesting that our computational insights come with biological relevance.
Genomics
What problem does this paper attempt to address?