Information Theory of Composite Sequence Motifs: Mutational and Biophysical Determinants of Complex Molecular Recognition

Elia Mascolo,Ivan Erill
DOI: https://doi.org/10.1101/2024.11.11.623117
2024-11-15
Abstract:The recognition of nucleotide sequence patterns is a fundamental biological process that controls the start sites of replication, transcription and translation, as well as transcriptional and translational regulation. Foundational work on the evolution of biological information showed that the amount of information encoded in the target nucleotide sequence patterns, a quantity named Rsequence, evolves by natural selection to match a predictable quantity called Rfrequency. In this work, we propose a generalization of this canonical framework that can describe composite sequence motifs: motifs composed of a series of sequence patterns at some variable (not necessarily conserved) distance from each other. We find that some information can be encoded through the conservation of the distance between sequence patterns, a quantity we named Rspacer, and that - to be functional - biological systems require the sum of Rsequence and Rspacer to be constant. We empirically validate our mathematical results through evolutionary simulations. We apply this general framework to demonstrate that the pre-recruitment of regulatory complexes to target sites has intrinsic advantages over in situ recruitment in terms of energy dissipation and search efficiency, and that realistic values of protein flexibility co-evolve with the target composite motifs to match their spacer size variability. Lastly, we show that the relative advantage of encoding information in sequence patterns or in spacers depends on the balance between nucleotide substitutions and insertions/deletions, with known estimates for the rates of these mutation types favoring the evolution of composite motifs with highly conserved spacer length.
Evolutionary Biology
What problem does this paper attempt to address?