HLMethy: a machine learning-based model to identify the hidden labels of m 6 A candidates

Ze Liu,Wei Dong,WenJie Luo,Wei Jiang,QuanWu Li,ZiLi He
DOI: https://doi.org/10.1007/s11103-019-00930-x
2019-01-01
Plant Molecular Biology
Abstract:Key message We developed a machine learning-based model to identify the hidden labels of m 6 A candidates from noisy m6A-seq data. Abstract Peak-calling approaches, such as MeRIP-seq or m 6 A-seq, are commonly used to map m 6 A modifications. However, these technologies can only map m 6 A sites with 100–200 nt resolution and cannot reveal the precise location or the number of modified residues in a transcript. To address this challenge, we developed a novel machine learning-based approach, named HLMethy, to assign labels to m 6 A candidates from noisy m 6 A-seq data. The multiple instance learning framework was adopted and two different training strategies were used to generate the classification model. To test the performance of our model, the m 6 A sites with single-base resolution were used and our model achieved comparable performance against existing instance-level predictors, which suggest that our model has the potential to improve the data quality of m 6 A-seq at reduced costs. What’s more, our generic framework can be extended to other newly found modifications that are found by peak-calling approaches. The source code of HLMethy is available at https://github.com/liuze-nwafu/HLMethy .
What problem does this paper attempt to address?