TSS-Captur: A User-Friendly Characterization Pipeline for Transcribed but Unclassified RNA transcripts

Mathias Alexander Witte Paz,Thomas Vogel,Kay Nieselt
DOI: https://doi.org/10.1101/2024.07.05.602221
2024-07-09
Abstract:RNA-seq and its 5'-enrichment-based methods for prokaryotes have enabled the base-exact identification of transcription starting sites (TSSs) and have improved gene expression analysis. Computational methods analyze this experimental data to identify TSSs and classify them based on proximal annotated genes. While some TSSs cannot be classified at all (orphan TSSs), other TSSs are found on the reverse strand of known genes (antisense TSSs), but are not associated with the direct transcription of any known gene. Here, we introduce TSS-Captur, a novel pipeline, that uses computational approaches to characterize genomic regions starting from experimentally confirmed, but unclassified TSSs. By analyzing experimental TSS data, TSS-Captur characterizes unclassified signals, hence complementing prokaryotic genome annotation tools and enhancing the bacterial transcriptome understanding. TSS-Captur classifies extracted transcripts into coding or non-coding genes and predicts for each putative transcript its transcription termination site. For non-coding genes, the secondary structure is computed. Furthermore, putative promoter regions are analyzed to identify enriched motifs. An interactive report allows a seamless data exploration. We validated TSS-Captur with a Campylobacter jejuni dataset and characterized unlabeled non-coding RNAs in Streptomyces coelicolor. Besides its usage over the command-line, TSS-Captur is available as a web-application to enhance its user accessibility and explorative capabilities.
Bioinformatics
What problem does this paper attempt to address?