A Classification Approach for DNA Methylation Profiling with Bisulfite Next-Generation Sequencing Data

Longjie Cheng,Yu Zhu
DOI: https://doi.org/10.1093/bioinformatics/btt674
IF: 5.8
2013-01-01
Bioinformatics
Abstract:MOTIVATIONWith the advent of high-throughput sequencing technology, bisulfite-sequencing-based DNA methylation profiling methods have emerged as the most promising approaches due to their single-base resolution and genome-wide coverage. However, statistical analysis methods for analyzing this type of methylation data are not well developed. Although the most widely used proportion-based estimation method is simple and intuitive, it is not statistically adequate in dealing with the various sources of noise in bisulfite-sequencing data. Furthermore, it is not biologically satisfactory in applications that require binary methylation status calls.RESULTSIn this article, we use a mixture of binomial model to characterize bisulfite-sequencing data, and based on the model, we propose to use a classification-based procedure, called the methylation status calling (MSC) procedure, to make binary methylation status calls. The MSC procedure is optimal in terms of maximizing the overall correct allocation rate, and the false discovery rate (FDR) and false non-discovery rate (FNDR) of MSC can be estimated. To control FDR at any given level, we further develop an FDR-controlled MSC procedure, which combines a local FDR-based adaptive procedure with the MSC procedure. Both simulation study and real data application are carried out to examine the performance of the proposed procedures. It is shown in our simulation study that the estimates of FDR and FNDR of the MSC procedure are appropriate. Simulation study also demonstrates that the FDR-controlled MSC procedure is valid in controlling FDR at a prespecified level and is more powerful than the individual binomial testing procedure. In the real data application, the MSC procedure exhibits an estimated FDR of 0.1426 and an estimated FNDR of 0.0067. The overall correct allocation rate is >0.97. These results suggest the effectiveness of our proposed procedures.AVAILABILITY AND IMPLEMENTATIONThe proposed procedures are implemented in R and are available at http://www.stat.purdue.edu/*cheng70/code.html.
What problem does this paper attempt to address?