Primary transcripts: From the discovery of RNA processing to current concepts of gene expression - Review
Klaus Scherrer
DOI: https://doi.org/10.1016/j.yexcr.2018.09.011
2018-12-15
Abstract:The main purpose of this review is to recall for investigators - and in particular students -, some of the early data and concepts in molecular genetics and biology that are rarely cited in the current literature and are thus invariably overlooked. There is a growing tendency among editors and reviewers to consider that only data produced in the last 10-20 years or so are pertinent. However this is not the case. In exact science, sound data and lucid interpretation never become obsolete, and even if forgotten, will resurface sooner or later. In the field of gene expression, covered in the present review, recent post-genomic data have indeed confirmed many of the earlier results and concepts developed in the mid-seventies, well before the start of the recombinant DNA revolution. Human brains and even the most powerful computers, have difficulty in handling and making sense of the overwhelming flow of data generated by recent high-throughput technologies. This was easier when low throughput, more integrative methods based on biochemistry and microscopy dominated biological research. Nowadays, the need for organising concepts is ever more important, otherwise the mass of available data can generate only "building ruins" - the bricks without an architect. Concepts such as pervasive transcription of genomes, large genomic domains, full domain transcripts (FDTs) up to 100 kb long, the prevalence of post-transcriptional events in regulating eukaryotic gene expression, and the 3D-genome architecture, were all developed and discussed before 1990, and are only now coming back into vogue. Thus, to review the impact of earlier concepts on later developments in the field, I will confront former and current data and ideas, including a discussion of old and new methods. Whenever useful, I shall first briefly report post-genomic developments before addressing former results and interpretations. Equally important, some of the terms often used sloppily in scientific discussions will be clearly defined. As a basis for the ensuing discussion, some of the issues and facts related to eukaryotic gene expression will first be introduced. In chapter 2 the evolution in perception of biology over the last 60 years and the impact of the recombinant DNA revolution will be considered. Then, in chapter 3 data and theory concerning the genome, gene expression and genetics will be reviewed. The experimental and theoretical definition of the gene will be discussed before considering the 3 different types of genetic information - the "Triad" - and the importance of post-transcriptional regulation of gene expression in the light of the recent finding that 90% of genomic DNA seems to be transcribed. Some previous attempts to provide a conceptual framework for these observations will be recalled, in particular the "Cascade Regulation Hypothesis" (CRH) developed in 1967-85, and the "Gene and Genon" concept proposed in 2007. A knowledge of the size of primary transcripts is of prime importance, both for experimental and theoretical reasons, since these molecules represent the primary units of the "RNA genome" on which most of the post-transcriptional regulation of gene expression occurs. In chapter 4, I will first discuss some current post-genomic topics before summarising the discovery of the high Mr-RNA transcripts, and the investigation of their processing spanning the last 50 years. Since even today, a consensus concerning the real form of primary transcripts in eukaryotic cells has not yet been reached, I will refer to the viral and specialized cellular models which helped early on to understand the mechanisms of RNA processing and differential splicing which operate in cells and tissues. As a well-studied example of expression and regulation of a specific cellular gene in relation to differentiation and pathology, I will discuss the early and recent work on expression of the globin genes in nucleated avian erythroblasts. An important concept is that the primary transcript not only embodies protein-coding information and regulation of its expression, but also the 3D-structure of the genomic DNA from which it was derived. The wealth of recent post-genomic data published in this field emphasises the importance of a fundamental principle of genome organisation and expression that has been overlooked for years even though it was already discussed in the 1970-80ties. These issues are addressed in chapter 5 which focuses on the involvement of the nuclear matrix and nuclear architecture in DNA and RNA biology. This section will make reference to the Unified Matrix Hypothesis (UMH), which was the first molecular model of the 3D organisation of DNA and RNA. The chapter on the "RNA-genome and peripheral memories" discusses experimental data on the ribonucleoprotein complexes containing pre-mRNA (pre-mRNPs) and mRNA (mRNPs) which are organised in nuclear and cytoplasmic spaces respectively. Finally, "Outlook " will enumerate currently unresolved questions in the field, and will propose some ideas that may encourage further investigation, and comprehension of available experimental data still in need of interpretation. In chapter 8, some propositions and paradigms basic to the authors own analysis are discussed. "In conclusion" the raison d'être of this review is recalled and positioned within the overall framework of scientific endeavour.