RealSeq2: a Software Integrated with UMI Identification, Error Correction, and Methylation Modifications Storing
Ke Wang,Mengmeng Song,Min Li,Tianyu Cui,Zhentian Liu,Edward Yu,Huan Fang,Xuan Gao,Xuefeng Xia,Jiayin Wang,Yanfang Guan,Tao Liu,Xin Yi
DOI: https://doi.org/10.1101/2023.05.16.539668
2023-01-01
Abstract:High-throughput UMI technology sequencing is widely used in early tumor screening, detection, recurrence monitoring, etc. Detecting extremely low-frequency mutations is especially important for monitoring tumor recurrence, so high-precision data, as well as high-quality data, are required. We developed RealSeq2 , a new integrated data-preprocessing software based on fastp and gencore, to achieve adapter removal, quality control, UMI identification, and generate consensus reads by clustering and error correction using multithreading in high-throughput next-generation sequencing background. RealSeq2 also supports methylation data of 5-methylcytosine bisulfite-free sequencing. RealSeq2 defined a new tag in SAM for storing methylation information, which is beneficial for co-identifying methylation sites and mutation sites for downstream analysis. RealSeq2 includes three submodules: ReadsProfiler, ReadsCleaner, and ReadsRecycler. In addition, the output format file (BAM or SAM) is universal for downstream analyses. RealSeq2 is the preferred upstream analysis software for the co-detection of ultra-low frequency mutations and bisulfite-free methylation data. The error profile provides data support for downstream analysis. Additionally, XM tags will become a standard protocol for recording methylation signals.