Identification of Bona Fide RNA Editing Sites: History, Challenges, and Opportunities
Meng How Tan
DOI: https://doi.org/10.1021/acs.accounts.3c00462
2023-11-07
Abstract:Adenosine-to-inosine (A-to-I) RNA editing, catalyzed by the adenosine deaminase acting on the RNA (ADAR) family of enzymes of which there are three members (ADAR1, ADAR2, and ADAR3), is a major gene regulatory mechanism that diversifies the transcriptome. It is widespread in many metazoans, including humans. As inosine is interpreted by cellular machineries mainly as guanosine, A-to-I editing effectively gives A-to-G nucleotide changes. Depending on its location, an editing event can generate new protein isoforms or influence other RNA processing pathways. Researchers have found that ADAR-mediated editing performs diverse functions. For example, it enables living organisms such as cephalopods to adapt rapidly to fluctuating environmental conditions such as water temperature. In development, the loss of ADAR1 is embryonically lethal partly because endogenous double-stranded RNAs (dsRNAs) are no longer marked by inosines, which signal "self", and thus cause the melanoma differentiation-associated protein 5 (MDA5) sensor to trigger a deleterious interferon response. Hence, ADAR1 plays a key role in preventing aberrant activation of the innate immune system. Furthermore, ADAR enzymes have been implicated in myriad human diseases. Intriguingly, some cancer cells are known to exploit ADAR1 activity to dodge immune responses. However, the exact identities of immunogenic RNAs in different biological contexts have remained elusive. Consequently, there is tremendous interest in identifying inosine-containing RNAs in the cell.The identification of A-to-I RNA editing sites is dependent on the sequencing of nucleic acids. Technological and algorithmic advancements over the past decades have revolutionized the way editing events are detected. At the beginning, the discovery of editing sites relies on Sanger sequencing, a first-generation technology. Both RNA, which is reverse transcribed into complementary DNA (cDNA), and genomic DNA (gDNA) from the same source are analyzed. After sequence alignment, one would require an adenosine to be present in the genome but a guanosine to be detected in the RNA sample for a position to be declared as an editing site. However, an issue with Sanger sequencing is its low throughput. Subsequently, Illumina sequencing, a second-generation technology, was invented. By permitting the simultaneous interrogation of millions of molecules, it enables many editing sites to be identified rapidly. However, a key challenge is that the Illumina platform produces short sequencing reads that can be difficult to map accurately. To tackle the challenge, we and others developed computational workflows with a series of filters to discard sites that are likely to be false positives. When Illumina sequencing data sets are properly analyzed, A-to-G variants should emerge as the most dominant mismatch type. Moreover, the quantitative nature of the data allows us to build a comprehensive atlas of editing-level measurements across different biological contexts, providing deep insights into the spatiotemporal dynamics of RNA editing. However, difficulties remain in identifying true A-to-I editing sites in short protein-coding exons or in organisms and diseases where DNA mutations and genomic polymorphisms are prevalent and mostly unknown. Nanopore sequencing, a third-generation technology, promises to address the difficulties, as it allows native RNAs to be sequenced without conversion to cDNA, preserving base modifications that can be directly detected through machine learning. We recently demonstrated that nanopore sequencing could be used to identify A-to-I editing sites in native RNA directly. Although further work is needed to enhance the detection accuracy in single molecules from fewer cells, the nanopore technology holds the potential to revolutionize epitranscriptomic studies.