Position-Defined CpG Islands Provide Complete Co-methylation Indexing for Human Genes

Ming Xiao,Ruiying Yin,Pengbo Gao,Jun Yu,Fubo Ma,Zichun Dai,Le Zhang
DOI: https://doi.org/10.1007/978-3-031-13829-4_27
2022-01-01
Abstract:DNA methylation, especially position-sensitive co-methylation of CpG islands (CGIs), is one of the key epigenomic mechanisms of gene expression regulation and chromosomal integrity. Therefore, thoroughly mapping the precise position of all CpG sequences within CGIs non-island clusters as well as their methylated status at single cell level under different physiological and pathological conditions becomes one of the ultimate goals for epigenomics. Toward this end, we compare our previously categorized position-defined CpG and methylation sites complementary to those of density-defined CpG islands to investigate patterns of such two categorized methylation sites relative to human gene expression regulation. Based on our previous analysis on LAUPs (Lineage-associated under-represented permutations) and the discovery that CpG-containing sequences are underrepresented when the distance among CpG sequences is ranged from 10bp to 14bp, we define such distances as discrete intervals at basepair precision and compute 12bp, 25bp, and 50bp, three position-defined CGIs groups according to the interval lengths, which cover 1.85 times greater CpG sites (14.98%) than those of density-defined CGIs (8.08%). This novel scheme reveals: (1) There are three partially-overlapping yet distinct position-defined CGI subgroups in the human genome. (2) The 12-bp CGIs appear unique to low-density CGIs or LCGIs but the other two CGIs, 25-bp and 50-bp, are found in all three density-defined CGIs. (3) The largest fraction of unmethylated (75.99%) and moderately-methylated (12.91%) core promoter-associated CGIs are found among the 12-bp CGIs but less found in 50-bp CGIs (41.77% for HCGIs and 20.03% for ICGI) of the same sequence region. (4) We conclude that in the Precision Medicine Era all CpG sites and their clusters are to be mapped and annotated, and modelled for gene expression regulation at single basepair precision.
What problem does this paper attempt to address?