Improving allele-specific epigenomic signal coverage by 10-fold using Hidden Markov Modeling and Machine Learning

Emmanuel LP Dumont,Ali Janati,Moumita Bhattacharya,Jean-Baptiste Jeannin,Catherine Do
DOI: https://doi.org/10.1101/2024.05.23.595536
2024-05-24
Abstract:Allele-specific epigenomic signals refer to differences in epigenomic patterns between the two copies, or "alleles," of a DNA region inherited from each parent. Epigenomic patterns are defined as alterations of the DNA sequence (e.g., chemical) without modifying the underlying DNA sequence (which would be referred to as "mutations"). Mapping allele-specific epigenomic signals across a genome is crucial, as some can influence gene expression, disease susceptibility, and developmental processes. However, identifying allele-specific epigenomic patterns across an entire genome is limited by the average read length (50-150 nucleotides) of short-read sequencing technologies, which are the most widely-used and affordable whole genome sequencing methods, and by the 99.9% similarity in the DNA sequences inherited from each parent. These limitations restrict the assessment of allele-specific signals to approximately 10% of the genome, potentially overlooking critical regulatory regions. In this paper, we present a highly effective machine-learning approach based on variational hidden Markov modeling, which enables the detection of allele-specific epigenomic signals across the entire genome, resulting in a 10-fold improvement in genomic coverage compared to state-of-the-art methods. We demonstrate our method on DNA methylation, a critical epigenomic regulatory signal.
Genomics
What problem does this paper attempt to address?