TKSM: highly modular, user-customizable, and scalable transcriptomic sequencing long-read simulator

Fatih Karaoğlanoğlu,Baraa Orabi,Ryan Flannigan,Cedric Chauve,Faraz Hach
DOI: https://doi.org/10.1093/bioinformatics/btae051
IF: 5.8
2024-01-25
Bioinformatics
Abstract:Abstract Motivation Transcriptomic long-read (LR) sequencing is an increasingly cost-effective technology for probing various RNA features. Numerous tools have been developed to tackle various transcriptomic sequencing tasks (e.g. isoform and gene fusion detection). However, the lack of abundant gold-standard datasets hinders the benchmarking of such tools. Therefore, the simulation of LR sequencing is an important and practical alternative. While the existing LR simulators aim to imitate the sequencing machine noise and to target specific library protocols, they lack some important library preparation steps (e.g. PCR) and are difficult to modify to new and changing library preparation techniques (e.g. single-cell LRs). Results We present TKSM, a modular and scalable LR simulator, designed so that each RNA modification step is targeted explicitly by a specific module. This allows the user to assemble a simulation pipeline as a combination of TKSM modules to emulate a specific sequencing design. Additionally, the input/output of all the core modules of TKSM follows the same simple format (Molecule Description Format) allowing the user to easily extend TKSM with new modules targeting new library preparation steps. Availability and implementation TKSM is available as an open source software at https://github.com/vpc-ccg/tksm.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?