Non-canonical RNA-DNA differences and other human genomic features are enriched within very short tandem repeats.

Hui Yu,Shilin Zhao,Scott Ness,Huining Kang,Quanhu Sheng,David C Samuels,Olufunmilola Oyebamiji,Ying-Yong Zhao,Yan Guo
DOI: https://doi.org/10.1371/journal.pcbi.1007968
2020-01-01
PLoS Computational Biology
Abstract:Very short tandem repeats bear substantial genetic, evolutional, and pathological significance in genome analyses. Here, we compiled a census of tandem mono-nucleotide/di-nucleotide/tri-nucleotide repeats (MNRs/DNRs/TNRs) in GRCh38, which we term "polytracts" in general. Of the human genome, 144.4 million nucleotides (4.7%) are occupied by polytracts, and 0.47 million single nucleotides are identified as polytract hinges, i.e., break-points of tandem polytracts. Preliminary exploration of the census suggested polytract hinge sites and boundaries of AAC polytracts may bear a higher mapping error rate than other polytract regions. Further, we revealed landscapes of polytract enrichment with respect to nearly a hundred genomic features. We found MNRs, DNRs, and TNRs displayed noticeable difference in terms of locational enrichment for miscellaneous genomic features, especially RNA editing events. Non-canonical and C-to-U RNA-editing events are enriched inside and/or adjacent to MNRs, while all categories of RNA-editing events are under-represented in DNRs. A-to-I RNA-editing events are generally under-represented in polytracts. The selective enrichment of non-canonical RNA-editing events within MNR adjacency provides a negative evidence against their authenticity. To enable similar locational enrichment analyses in relation to polytracts, we developed a software Polytrap which can handle 11 reference genomes. Additionally, we compiled polytracts of four model organisms into a Track Hub which can be integrated into USCS Genome Browser as an official track for convenient visualization of polytracts.
What problem does this paper attempt to address?