Abstract:MotivationThe rapid development of next-generation sequencing technology provides an opportunity to study genome-wide DNA methylation at single-base resolution. However, depletion of unmethylated cytosines brings challenges for aligning bisulfite-converted sequencing reads to a large reference. Software tools for aligning methylation reads have not yet been comprehensively evaluated, especially for the widely used reduced representation bisulfite sequencing (RRBS) that involves enrichment for CpG islands (CGIs).ResultsWe specially developed a simulator, RRBSsim, for benchmarking analysis of RRBS data. We performed extensive comparison of seven mapping algorithms for methylation analysis in both real and simulated RRBS data. Eighteen lung tumors and matched adjacent tissues were sequenced by the RRBS protocols. Our empirical evaluation found that methylation results were less consistent between software tools for CpG sites with low sequencing depth, medium methylation level, on CGI shores or gene body. These observations were further confirmed by simulations that indicated software tools generally had lower recall of detecting these vulnerable CpG sites and lower precision of estimating methylation levels in these CpG sites. Among the software tools tested, bwa-meth and BS-Seeker2 (bowtie2) are currently our preferred aligners for RRBS data in terms of recall, precision and speed. Existing aligners cannot efficiently handle moderately methylated CpG sites and those CpG sites on CGI shores or gene body. Interpretation of methylation results from these vulnerable CpG sites should be treated with caution. Our study reveals several important features inherent in methylation data, and RRBSsim provides guidance to advance sequence-based methylation data analysis and methodological development.Availability and implementationRRBSsim is a simulator for benchmarking analysis of RRBS data and its source code is available at https://github.com/xwBio/RRBSsim or https://github.com/xwBio/Docker-RRBSsim.Supplementary informationSupplementary data are available at Bioinformatics online.

Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures

Evaluating long-read RNA-sequencing analysis tools with in silico mixtures

Comprehensive Assessment of Isoform Detection Methods for Third-Generation Sequencing Data

Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data

Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments

A comprehensive benchmarking of differential splicing tools for RNA-seq analysis at the event level

scRNA-seq mixology: towards better benchmarking of single cell RNA-seq analysis methods

Assembly Arena: Benchmarking RNA isoform reconstruction algorithms for nanopore sequencing

A Real-World Multi-Center RNA-seq Benchmarking Study Using the Quartet and MAQC Reference Materials

Toward Best Practice in Identifying Subtle Differential Expression with RNA-seq: A Real-World Multi-Center Benchmarking Study Using Quartet and MAQC Reference Materials

Comprehensive benchmark of differential transcript usage analysis for static and dynamic conditions

Evaluation and comparison of computational tools for RNA-seq isoform quantification

Benchmarking of RNA-sequencing analysis workflows using whole-transcriptome RT-qPCR expression data

Quartet RNA Reference Materials and Ratio-Based Reference Datasets for Reliable Transcriptomic Profiling

A Comprehensive Evaluation of Alignment Software for Reduced Representation Bisulfite Sequencing Data

Benchmarking of computational methods for m6A profiling with Nanopore direct RNA sequencing

Enhancing novel isoform discovery: leveraging nanopore long-read sequencing and machine learning approaches

Benchmarking integration of single-cell differential expression

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

The shaky foundations of simulating single-cell RNA sequencing data

A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples