CFC-seq: identification of full-length capped RNAs unveil enhancer-derived transcription
Chi Wai Yip,Callum Parr,Hazuki Takahashi,Kayoko Yasuzawa,Matthew Valentine,Hiromi Nishiyori-Sueki,Camilla Ugolini,Valeria Ranzani,Mitsuyoshi Murata,Masaki Kato,Wenjing Kang,Wing Hin Yip,Youtaro Shibayama,Andre Darah Sim,Ying Chen,Xufeng Shu,Jonathan Darah Moody,Ramzan Umarov,Jen-Chien Chang,Luca Pandolfini,Tsugumi Kawashima,Michihira Tagami,Tomoe Nobusada,Tsukasa Kouno,Carlos Alfonso Gonzalez,Roberto Albanese,Francesco Dossena,Nejc Haberman,Kokoro Ozaki,Takeya Kasukawa,Boris Lenhard,Martin Frith,Beatrice Bodega,Francesco Nicassio,Lorenzo Calviello,Magda Bienko,Ivano Legnini,Valerie Hilgers,Stefano Gustincich,Jonathan Goeke,Charles Henri Lecellier,Jay W Shin,Chung-Chau Hon,Piero Carninci
DOI: https://doi.org/10.1101/2024.10.31.620483
2024-11-01
Abstract:Long-read sequencing has emerged as a powerful tool for uncovering novel transcripts and genes. However, existing protocols often lack confidence in identifying the transcription start site (TSS) and fail to capture non-poly(A) RNA, thereby limiting the discovery of novel genes, particularly long non-coding RNAs (lncRNAs). In this study, we introduce Cap-trap full-length cDNA sequencing (CFC-seq), a comprehensive protocol that combines Cap-trapping and poly(A)-tailing with Oxford Nanopore sequencing. This protocol enables precise identification of TSSs and full-length transcripts. Applying CFC-seq to two in vitro differentiation time courses resulted in approximately 236 million mappable reads. The transcript Start-site Aware Long-read Assembler (SALA) was developed for de novo assembling the transcript models, leading to the identification of 39,425 confident novel genes. Using this dataset, enhancer-derived ncRNAs were re-defined with longer length and more splicing activity, which were correlated with enhancer structure. Compared to enhancers with CpG islands, TATA box enhancers were shown to be more cell-type specific with fewer chromatin interaction but produced longer and more stable polyadenylated RNA. A significant proportion of these TATA box-derived eRNAs originated from LTR transposable elements. Overall, this study systematically annotated ~24,000 novel eRNA genes and correlated their transcription properties with enhancer structure.
Genomics