Abstract:MotivationThe rapid development of next-generation sequencing technology provides an opportunity to study genome-wide DNA methylation at single-base resolution. However, depletion of unmethylated cytosines brings challenges for aligning bisulfite-converted sequencing reads to a large reference. Software tools for aligning methylation reads have not yet been comprehensively evaluated, especially for the widely used reduced representation bisulfite sequencing (RRBS) that involves enrichment for CpG islands (CGIs).ResultsWe specially developed a simulator, RRBSsim, for benchmarking analysis of RRBS data. We performed extensive comparison of seven mapping algorithms for methylation analysis in both real and simulated RRBS data. Eighteen lung tumors and matched adjacent tissues were sequenced by the RRBS protocols. Our empirical evaluation found that methylation results were less consistent between software tools for CpG sites with low sequencing depth, medium methylation level, on CGI shores or gene body. These observations were further confirmed by simulations that indicated software tools generally had lower recall of detecting these vulnerable CpG sites and lower precision of estimating methylation levels in these CpG sites. Among the software tools tested, bwa-meth and BS-Seeker2 (bowtie2) are currently our preferred aligners for RRBS data in terms of recall, precision and speed. Existing aligners cannot efficiently handle moderately methylated CpG sites and those CpG sites on CGI shores or gene body. Interpretation of methylation results from these vulnerable CpG sites should be treated with caution. Our study reveals several important features inherent in methylation data, and RRBSsim provides guidance to advance sequence-based methylation data analysis and methodological development.Availability and implementationRRBSsim is a simulator for benchmarking analysis of RRBS data and its source code is available at https://github.com/xwBio/RRBSsim or https://github.com/xwBio/Docker-RRBSsim.Supplementary informationSupplementary data are available at Bioinformatics online.

methyLImp2: faster missing value estimation for DNA methylation data

Fast matrix completion in epigenetic methylation studies with informative covariates

R Methylcipher: A Methylation Clock Investigational Package for Hypothesis-Driven Evaluation & Research

BayMeth: improved DNA methylation quantification for affinity capture sequencing data using a flexible Bayesian approach

Imputing not available values in single-cell DNA methylation data using the median is straightforward and effective

Pipeline Olympics: continuable benchmarking of computational workflows for DNA methylation sequencing data against an experimental gold-standard

A Comprehensive Evaluation of Alignment Software for Reduced Representation Bisulfite Sequencing Data

MethParquet: an R package for rapid and efficient DNA methylation association analysis adopting Apache Parquet

RnBeads 2.0: comprehensive analysis of DNA methylation data

BatMeth: Improved Mapper for Bisulfite Sequencing Reads on DNA Methylation.

Imputation for Lipidomics and Metabolomics (ImpLiMet): Online application for optimization and method selection for missing data imputation

MethyLasso: a segmentation approach to analyze DNA methylation patterns and identify differentially methylated regions from whole-genome datasets

Estimation of the methylation pattern distribution from deep sequencing data

GSimp: A Gibbs Sampler Based Left-Censored Missing Value Imputation Approach for Metabolomics Studies

A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing

Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays

P-Hint-Hunt: a deep parallelized whole genome DNA methylation detection tool

MethylCallR : a comprehensive analysis framework for Illumina Methylation Beadchip

PoreMeth2: decoding the evolution of methylome alterations with Nanopore sequencing

mLiftOver: harmonizing data across Infinium DNA methylation platforms

Functional DNA methylation differences between tissues, cell types, and across individuals discovered using the M&M algorithm