PAPET: a collection of performant algorithms to identify 5-methyl cytosine from PacBio SequelII data

Romain Groux,Ioannis Xenarios,Emanuel Schmid-Siegert
DOI: https://doi.org/10.1101/2023.03.17.533149
2024-09-23
Abstract:CpGs methylation is an important feature for the regulation of gene expression in vertebreate genomes. In this paper, we present the PAcBio Predicting Epigenetics Toolkit (PAPET) algorithms. PAPET is a collection of general algorithms and tools to train predictive models and predict epigenetics from SequelII data. This set of tools is worth for the PacBio user community to keep up with the fast evolving pace of PacBio sequencing technology. We apply this framework to predict CpG methylation from SequelII data and demonstrate that the classifiers obtained compare equally with their best in class counterparts. PAPET is implemented in C++ to ensure resource efficiency and an easy scalability to large datasets. Moreover, PAPET is fully multi-threaded. The source code is available at https://github.com/ngs-ai-org/papet.
Bioinformatics
What problem does this paper attempt to address?