A comprehensive catalog of predicted functional upstream open reading frames in humans

Patrick McGillivray,Russell Ault,Mayur Pawashe,Robert Kitchen,Suganthi Balasubramanian,Mark Gerstein
DOI: https://doi.org/10.1093/nar/gky188
IF: 14.9
2018-03-19
Nucleic Acids Research
Abstract:Upstream open reading frames (uORFs) latent in mRNA transcripts are thought to modify translation of coding sequences by altering ribosome activity. Not all uORFs are thought to be active in such a process. To estimate the impact of uORFs on the regulation of translation in humans, we first circumscribed the universe of all possible uORFs based on coding gene sequence motifs and identified 1.3 million unique uORFs. To determine which of these are likely to be biologically relevant, we built a simple Bayesian classifier using 89 attributes of uORFs labeled as active in ribosome profiling experiments. This allowed us to extrapolate to a comprehensive catalog of likely functional uORFs. We validated our predictions using in vivo protein levels and ribosome occupancy from 46 individuals. This is a substantially larger catalog of functional uORFs than has previously been reported. Our ranked list of likely active uORFs allows researchers to test their hypotheses regarding the role of uORFs in health and disease. We demonstrate several examples of biological interest through the application of our catalog to somatic mutations in cancer and disease-associated germline variants in humans.
biochemistry & molecular biology
What problem does this paper attempt to address?