Large deviation principles and evolutionary multiple structure alignment of non-coding RNA

Brandon Legried
2024-05-23
Abstract:Non-coding RNA are functional molecules that are not translated into proteins. Their function comes as important regulators of biological function. Because they are not translated, they need not be as stable as other types of RNA. The TKF91 Structure Tree from Holmes 2004 is a probability model that effectively describes correlated substitution, insertion, and deletion of base pairs, and found to have some worth in understanding dynamic folding patterns. In this paper, we provide a new probabilistic analysis of the TKF91 Structure Tree. Large deviation principles on stem lengths, helix lengths, and tree size are proved. Additionally, we give a new alignment procedure that constructs accurate sequence and structural alignments for sequences with low identity for a dense enough phylogeny.
Other Quantitative Biology
What problem does this paper attempt to address?
This paper mainly discusses the evolution and structural alignment of non-coding RNA (ncRNA). Although ncRNAs are not translated into proteins, they play important roles as regulators of biological functions. The paper presents a new probabilistic analysis of the Thorne-Komogorov-Felsenstein 1991 (TKF91) structure tree model, which describes correlated substitutions, insertions, and deletions of base pairs. Using large deviation principles, the paper proves the statistical properties of stem length, helix length, and tree size, and proposes a new sequence and structure alignment method, particularly suitable for sequence pairs with low sequence identity. In the TKF91 structure tree model, the paper considers insertion and deletion mutations and how they affect the structural evolution of RNA. The study also focuses on sequence alignment and structure prediction in cases of different sequence lengths and low sequence similarity. The paper demonstrates that sequence alignment and structure prediction can be performed even under these conditions. The main findings include the statistical properties of the TKF91 structure tree model, such as estimation of branch patterns, as well as large deviation principles for average loop sequence length and average stem sequence length. These results provide statistical guarantees for predicting RNA secondary structures and propose an algorithm that can align RNA sequences and predict their secondary structures based on evolutionary signals. The paper also discusses the expected values and variances of average loop sequence length and stem sequence length under known branch patterns, as well as a large deviation principle for unconditional average stem sequence length. In addition, the paper proposes a prediction program that can predict secondary structures with high probability given a certain branch pattern. In summary, this paper aims to improve the alignment and prediction methods for RNA sequences and structures by deepening the understanding of the structural evolution of ncRNAs, in order to enhance the accuracy of understanding biological functions.