Specific RNA Recognition by Designer Pentatricopeptide Repeat Protein.
Cuicui Shen,Xiang Wang,Yexing Liu,Quanxiu Li,Zhao Yang,Nieng Yan,Tingting Zou,Ping Yin
DOI: https://doi.org/10.1016/j.molp.2015.01.001
IF: 27.5
2015-01-01
Molecular Plant
Abstract:Manipulation of gene expression through targeting specific DNA or RNA sequences is a significant challenge. In the past decade, transcription activator-like (TAL) effectors and zinc fingers (ZFs) have been successfully developed into useful tools for DNA recognition (Bogdanove and Voytas, 2011Bogdanove A.J. Voytas D.F. TAL effectors: customizable proteins for DNA targeting.Science. 2011; 333: 1843-1846Crossref PubMed Scopus (744) Google Scholar, Deng et al., 2012aDeng D. Yan C. Pan X. Mahfouz M. Wang J. Zhu J.K. Shi Y. Yan N. Structural basis for sequence-specific recognition of DNA by TAL effectors.Science. 2012; 335: 720-723Crossref PubMed Scopus (434) Google Scholar, Deng et al., 2012bDeng D. Yin P. Yan C. Pan X. Gong X. Qi S. Xie T. Mahfouz M. Zhu J.K. Yan N. et al.Recognition of methylated DNA by TAL effectors.Cell Res. 2012; 22: 1502-1504Crossref PubMed Scopus (91) Google Scholar). However, little progress has been made in the realm of RNA targeting due to the lack of understanding about the modular RNA recognition mechanism. Pumilio and FBF homology (PUF) proteins and pentatricopeptide repeat (PPR) proteins are two types of sequence-specific single-strand RNA (ssRNA) binding proteins with the potential to serve as effective RNA targeting tools (Filipovska and Rackham, 2013Filipovska A. Rackham O. Pentatricopeptide repeats: modular blocks for building RNA-binding proteins.RNA Biol. 2013; 10: 1426-1432Crossref PubMed Scopus (33) Google Scholar, Campbell et al., 2014Campbell Z.T. Valley C.T. Wickens M. A protein-RNA specificity code enables targeted activation of an endogenous human transcript.Nat. Struct. Mol. Biol. 2014; 21: 732-738Crossref PubMed Scopus (63) Google Scholar). PPR proteins, generally containing 2–30 tandem repeats, are present in terrestrial plants as a large family (Schmitz-Linneweber and Small, 2008Schmitz-Linneweber C. Small I. Pentatricopeptide repeat proteins: a socket set for organelle gene expression.Trends Plant Sci. 2008; 13: 663-670Abstract Full Text Full Text PDF PubMed Scopus (650) Google Scholar, Barkan and Small, 2014Barkan A. Small I. Pentatricopeptide repeat proteins in plants.Annu. Rev. Plant Biol. 2014; 65: 415-442Crossref PubMed Scopus (634) Google Scholar). PPR proteins function as sequence-specific single-stranded RNA binding proteins mainly in chloroplasts and mitochondria, where they are involved in many diverse aspects of organelle RNA metabolism processes, including RNA editing, maturation, stability, and translation. Each repeat of PPR is typically composed of 35 amino acids organized into a hairpin of α helices. Previous computational and biochemical analyses suggest a model of PPR modular RNA recognition: one RNA base coordinates with one PPR motif (Barkan et al., 2012Barkan A. Rojas M. Fujii S. Yap A. Chong Y.S. Bond C.S. Small I. A combinatorial amino acid code for RNA recognition by pentatricopeptide repeat proteins.PLoS Genet. 2012; 8: e1002910Crossref PubMed Scopus (374) Google Scholar, Yagi et al., 2013Yagi Y. Hayashi S. Kobayashi K. Hirayama T. Nakamura T. Elucidation of the RNA recognition code for pentatricopeptide repeat proteins involved in organelle RNA editing in plants.PLoS One. 2013; 8: e57286Crossref PubMed Scopus (205) Google Scholar). The recently reported crystal structure of PPR10 in RNA-bound state (Protein Data Bank ID 4M59) corroborates this model (Yin et al., 2013Yin P. Li Q. Yan C. Liu Y. Liu J. Yu F. Wang Z. Long J. He J. Wang H.W. et al.Structural basis for the modular recognition of single-stranded RNA by PPR proteins.Nature. 2013; 504: 168-171Crossref PubMed Scopus (236) Google Scholar). According to the crystal structure of PPR10 in complex with RNA, within an intact PPR repeat, amino acid residues at positions 2, 5, and 35 are responsible for sequence-specific recognition of RNA bases (Figure 1A and Supplemental Figure 1A) (Yin et al., 2013Yin P. Li Q. Yan C. Liu Y. Liu J. Yu F. Wang Z. Long J. He J. Wang H.W. et al.Structural basis for the modular recognition of single-stranded RNA by PPR proteins.Nature. 2013; 504: 168-171Crossref PubMed Scopus (236) Google Scholar). These three amino acids were also proposed as “code” amino acids for base discrimination (Barkan et al., 2012Barkan A. Rojas M. Fujii S. Yap A. Chong Y.S. Bond C.S. Small I. A combinatorial amino acid code for RNA recognition by pentatricopeptide repeat proteins.PLoS Genet. 2012; 8: e1002910Crossref PubMed Scopus (374) Google Scholar, Yagi et al., 2013Yagi Y. Hayashi S. Kobayashi K. Hirayama T. Nakamura T. Elucidation of the RNA recognition code for pentatricopeptide repeat proteins involved in organelle RNA editing in plants.PLoS One. 2013; 8: e57286Crossref PubMed Scopus (205) Google Scholar, Barkan and Small, 2014Barkan A. Small I. Pentatricopeptide repeat proteins in plants.Annu. Rev. Plant Biol. 2014; 65: 415-442Crossref PubMed Scopus (634) Google Scholar). Two hydrophobic residues at position 2 from two consecutive repeats have been highlighted as putative RNA-interacting residues sandwiching one RNA base, indicating that an amino acid at this position could influence RNA recognition of a preceding PPR motif. The major determinant is the polar amino acid at position 5. Asparagine at this position strongly correlates with pyrimidine at corresponding position of target RNA, while threonine or serine correlates with purine. Another determinant is located at position 35, appearing to stabilize the conformation of the fifth residue. All amino acids at these two positions involved in nucleotide specification have side chains that are avid hydrogen bond donors or acceptors (Yin et al., 2013Yin P. Li Q. Yan C. Liu Y. Liu J. Yu F. Wang Z. Long J. He J. Wang H.W. et al.Structural basis for the modular recognition of single-stranded RNA by PPR proteins.Nature. 2013; 504: 168-171Crossref PubMed Scopus (236) Google Scholar). Here, based on PPR structure and related bioinformatics analysis, we developed a set of designer proteins, which possess RNA recognition specificity with their artificial PPR motifs. To detect protein–RNA interactions, we set up an in vitro assay aiming to examine the ligand binding activity and specificity of designer proteins (experimental details in Supplemental Materials and Methods). We looked into the design of specific PPR motifs and took a conservative approach to construct PPR motifs. First, we analyzed repeat sequences of all P-type PPR proteins, all of which contain 35 amino acids per repeat, from Arabidopsis thaliana (Supplemental Figure 1B; Lurin et al., 2004Lurin C. Andres C. Aubourg S. Bellaoui M. Bitton F. Bruyere C. Caboche M. Debast C. Gualberto J. Hoffmann B. et al.Genome-wide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis.Plant Cell. 2004; 16: 2089-2103Crossref PubMed Scopus (999) Google Scholar). Next, we selected the most evolutionarily conserved amino acids of P-type PPR motifs as the scaffold of designer PPR motifs to build up the primary structure of the RNA base recognition units. The exceptions among these amino acids (those at position 2, 5, and 35) are RNA selection “codes” (Figure 1A). Our in vitro assay requires soluble proteins and radioactive target RNA probes. To optimize the solubility and behavior of designer proteins, we fused two capping domains, one N-terminal domain (NTD), comprising amino acids 37–208 of PPR10, and one C-terminal domain (CTD), comprising amino acids 737–786 of PPR10, to the amino and carboxyl termini of multiple designer PPR motifs (Supplemental Figure 2). We synthesized a series of designer protein genes with different PPR repeat motifs and purified the proteins for further biochemical assessment (for experimental details, see Supplemental Materials and Methods). Recombinant dPPRs (designer PPR proteins) were purified to homogeneity (Supplemental Figure 3). To achieve the goal of building up motif modules with the capability of specific RNA base recognition, we constructed designer proteins containing 10 tandem identical PPR repeat motifs with a number of combinations of “code” amino acids to identify the best code for specific RNA recognition (Figure 1B and Supplemental Figure 2). Through the electrophoretic mobility shift assay (EMSA), we tested more than 10 types of the most frequent combination identified by previous bioinformatics analysis (Barkan et al., 2012Barkan A. Rojas M. Fujii S. Yap A. Chong Y.S. Bond C.S. Small I. A combinatorial amino acid code for RNA recognition by pentatricopeptide repeat proteins.PLoS Genet. 2012; 8: e1002910Crossref PubMed Scopus (374) Google Scholar). Eventually we were able to construct PPR motifs as basic RNA base recognition units that selectively recognize RNA bases A, U, and C, corresponding with amino acid codes VSN, VND, and VNS, respectively (Figure 1A). Designer proteins all contain one NTD, 10 tandem artificially designed PPR motifs, and one CTD (Figure 1B). The designer proteins depicted in the schematic diagram in Figure 1B are designated as dPPR-A10, dPPR-U10, and dPPR-C10, respectively. Designer proteins dPPR-A10, dPPR-U10, and dPPR-C10 present high RNA binding specificity according to the EMSA results. As predicted, all of the three dPPRs selectively bound their respective target RNA, whereas no significant non-specific protein–RNA binding was detected. For instance, dPPR-A10 only bound probe RNA poly A10, displaying no signs of interaction with poly U10, poly C10 (Figure 1C), or poly G10 (Supplemental Figure 4). The apparent dissociation constant of dPPR-A10 and its ligand is approximately 160 nM. Designer protein dPPR-U10 also exhibited behavior similar to that of dPPR-A10, and dPPR-C10 showed slightly higher binding affinity (Figure 1D and 1E; Supplemental Figure 4A). These results indicate that, in light of the perspective of protein engineering, 10 consecutive designer PPR motifs are sufficient to achieve specific RNA recognition. We also intend to build up specific PPR motifs recognizing RNA base G. However, we were unable to determine the appropriate combination of the three-position code, likely due to the unsuitability of the other 32 non-code amino acids as a motif scaffold or the structural instability of poly-G tracts. We next tested the usability of designer PPR proteins with combinations of the designer PPR motifs that we determined specifically bind RNA bases A, U, and C. We designed and purified proteins containing binary and ternary blocks of designer PPR motif A, U, and C in patterns similar to those of dPPR-A/U/C10 and designated them as dPPR-UA and dPPR-UAC, respectively. dPPR-UA contains five sets of consecutive designer PPR motifs recognizing UA, while dPPR-UAC comprises three sets of designer PPR motifs UAC and a designer PPR motif U. dPPR-UA selectively bound RNA probe (UA)5, not (UAC)3U, and vice versa (Figure 1F and 1G). Neither of these two designer proteins was capable of binding RNA N10 (N stands for different types of RNA nucleotides) probes (Supplemental Figure 5), suggesting that for a 10-nucleotide long RNA and designer protein that contains 10 consecutive PPR repeat motifs, one corresponding motif out of every two or three is insufficient to achieve specific RNA recognition. We also attempted to discover the minimum number of PPR motif repeats required for specific RNA binding using native gel-shift assay (Supplemental Figure 6). The result indicates that the minimum numbers of PPR motifs to achieve specific binding differ for different types of RNA nucleotides, even though the apparent dissociation constants of dPPR-N10s with corresponding RNA are at similar levels (Supplemental Figure 4A). For example, 6-mers of dPPR-A motif is enough for nucleotide A recognition, whereas 8-mers of dPPR-U or dPPR-C show sufficient RNA binding activities. In summary, we designed various types of chimeric recombinant proteins containing specific PPR motifs, which recognize RNA bases A, C, and U with a high degree of modular selectivity, and achieved specific RNA recognition by designer pentatricopeptide repeat proteins. Many obstacles still hinder the manipulation of designer proteins with target ssRNA. Several parameters of designer proteins and designer motifs remain to be optimized, such as motif numbers, amino acid sequences of the PPR motif, combinations of different motifs, and so forth. Our future research will also concentrate on designing PPR motifs that specifically bind the RNA base guanine, determining atomic structures of designer protein-target RNA complexes and quantifying these RNA recognition codes’ specificities (Campbell et al., 2014Campbell Z.T. Valley C.T. Wickens M. A protein-RNA specificity code enables targeted activation of an endogenous human transcript.Nat. Struct. Mol. Biol. 2014; 21: 732-738Crossref PubMed Scopus (63) Google Scholar). Potentially, with more thorough study into designer PPR motifs, engineered RNA editing factors could be applied to modify the amino acid sequences of organelle-encoded proteins, and domains with diversified functions (e.g. RNA-cleaving enzymes or fluorescent proteins) could be targeted to specific organellar RNAs via designer PPR tracts (Figure 1H). The development of analogous applications outside of organelles may eventually be feasible, demonstrating that PPR-based designer proteins show promise as a universal RNA targeting/processing tool in the future (Filipovska and Rackham, 2013Filipovska A. Rackham O. Pentatricopeptide repeats: modular blocks for building RNA-binding proteins.RNA Biol. 2013; 10: 1426-1432Crossref PubMed Scopus (33) Google Scholar, Barkan and Small, 2014Barkan A. Small I. Pentatricopeptide repeat proteins in plants.Annu. Rev. Plant Biol. 2014; 65: 415-442Crossref PubMed Scopus (634) Google Scholar, Yagi et al., 2014Yagi Y. Nakamura T. Small I. The potential for manipulating RNA with pentatricopeptide repeat proteins.Plant J. 2014; 78: 772-782Crossref PubMed Scopus (53) Google Scholar). This work was funded by the National Natural Science Foundation of China (Program No. 31200567), the Fundamental Research Funds for the Central Universities (Program No. 2014JQ001), and the Huazhong Agricultural University Scientific & Technological Self-innovation Foundation (Program No. 2013RC013).