A pan-tissue, pan-disease compendium of human orphan genes

Urminder Singh,Jeffrey A. Haltom,Joseph W. Guarnieri,Jing Li,Arun Seetharam,Afshin Beheshti,Bruce Aronow,Eve Syrkin Wurtele
DOI: https://doi.org/10.1101/2024.02.21.581488
2024-02-26
Abstract:Species-specific genes are ubiquitous in evolution, with functions ranging from prey paralysis to survival in subzero temperatures. Because they are typically expressed under limited conditions and lack canonical features, such genes may be vastly under-identified, even in humans. Here, we leverage terabytes of human RNA-Seq data to identify thousands of highly-expressed transcripts that do not correspond to any Gencode-annotated gene. Many may be novel ncRNAs although 80% of them contain ORFs that have the potential of encoding proteins unique to (orphan genes). We validate our findings with independent strand-specific and single-cell RNA-seq datasets. Hundreds of these novel transcripts overlap with deleterious genomic variants; thousands show significant association with disease-specific patient survival. Most are dynamically regulated and accumulate selectively in particular tissues, cell-types, developmental stages, tumors, COVID-19, sex, and ancestries. As such, these transcripts hold potential as diagnostic biomarkers or therapeutic targets. To empower future discovery, we provide a compendium of these huge RNA-Seq expression data, and RiboSeq data, with associated metadata. Further, we supply the gene models for the novel genes as UCSC Genome Browser tracks.
Evolutionary Biology
What problem does this paper attempt to address?