Crystal Structure of an ADP‐ribosylated Protein with a Cytidine Deaminase‐like Fold, but Unknown Function (TM1506), from Thermotoga Maritima at 2.70 Å Resolution
Qingping Xu,Piotr Koźbiał,Daniel McMullan,Sanjeev Krishna,Scott M. Brittain,Scott B. Ficarro,Michael DiDonato,Mitchell D. Miller,Polat Abdubek,Herbert L. Axelrod,Hsiu‐Ju Chiu,Thomas Clayton,Lian Duan,Marc‐André Elsliger,Julie Feuerhelm,Slawomir K. Grzechnik,Joanna Hale,Gye Won Han,Lukasz Jaroszewski,Heath E. Klock,Andrew T. Morse,Edward Nigoghossian,Jessica Paulsen,Ron Reyes,Christopher L. Rife,Henry van den Bedem,Aprilfawn White,Keith O. Hodgson,John Wooley,Ashley M. Deacon,Adam Godzik,Scott A. Lesley,Ian A. Wilson
DOI: https://doi.org/10.1002/prot.21992
2008-01-01
Abstract:The TM1506 gene of Thermotoga maritima encodes a protein with a molecular weight of 15.7 kDa (residues 1–139) and a calculated isoelectric point (pI) of 8.9. TM1506 was selected for structure determination to extend the structural coverage of the T. maritima proteome (coverage data: http://ffas.burnham.org/ffas-cgi/cgi/tm_cov.pl), which is currently one of the highest for a single organism.1 TM1506 is a member of an uncharacterized protein family (PF08973, DUF1893) but, in contrast to other members of this family, has an additional Lys/Arg-rich N-terminal α-helix and a basic pI. The function of TM1506 is unknown, although its genomic neighborhood indicates possible functional associations with S12 and other ribosomal proteins. Here, we report the crystal structure of TM1506, which was determined using the semiautomated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG),2 as part of the National Institute of General Medical Sciences' Protein Structure Initiative (PSI). The gene encoding TM1506 (TIGR: TM1506, Swiss-Prot: Q9X1J4) was amplified by polymerase chain reaction (PCR) from genomic DNA using PfuTurbo DNA polymerase (Stratagene) and primers corresponding to the predicted 5′ and 3′ ends. The PCR product was cloned into plasmid pMH1, which encodes an expression and purification tag (MGSDKIHHHHHH) at the amino terminus of the full-length protein. The cloning junctions were confirmed by DNA sequencing. Protein expression was performed in a selenomethionine-containing medium using the Escherichia coli methionine auxotrophic strain DL41. At the end of fermentation, lysozyme was added to the culture to a final concentration of 250 μg/mL, and the cells were harvested. After one freeze/thaw cycle, the cells were sonicated in Lysis Buffer [50 mM Tris pH 7.9, 50 mM NaCl, 10 mM imidazole, and 0.25 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP)], and the lysate was centrifuged at 3400 × g for 60 min. The supernatant was applied to nickel-chelating resin (GE Healthcare) pre-equilibrated with Lysis Buffer, the resin was washed with Wash Buffer [50 mM potassium phosphate pH 7.8, 300 mM NaCl, 40 mM imidazole, 10% (v/v) glycerol, and 0.25 mM TCEP], and the protein was eluted with Elution Buffer [20 mM Tris pH 7.9, 300 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP]. The eluate was buffer exchanged with Buffer Q [20 mM Tris pH 7.9, 50 mM NaCl, 5% (v/v) glycerol, 0.25 mM TCEP] and applied to a RESOURCE Q column (GE Healthcare) pre-equilibrated with the same buffer. The flow-through fraction, which contained TM1506, was buffer exchanged with Crystallization Buffer [20 mM Tris pH 7.9, 150 mM NaCl, and 0.25 mM TCEP] and concentrated for crystallization assays to 15 mg/mL by centrifugal ultrafiltration (Millipore). TM1506 was crystallized using the nanodroplet vapor diffusion method3 with standard JCSG crystallization protocols.2 The crystallization reagent that produced the crystal used for structure solution contained 40% (v/v) polyethylene glycol 600, 0.1 M imidazole pH 8.0, and 0.2 M Zn(OAc)2 (final pH 5.8). No additional cryoprotectant was added to the crystal. Initial screening for diffraction was carried out using the Stanford Automated Mounting system (SAM)4 and an X-ray microsource5 installed in a Stanford Synchrotron Radiation Laboratory beamline (SSRL, Menlo Park, CA). The crystal was indexed in hexagonal space group P6222 (Table I).6, 7 Molecular weight and oligomeric state of TM1506 were determined using a 1 cm × 30 cm Superdex 200 column (GE Healthcare) in combination with static light scattering (Wyatt Technology). The mobile phase consisted of 20 mM Tris pH 8.0 and 150 mM NaCl. Mutation of Asp56 to Ala and cloning of the D56A mutant was achieved using the polymerase incomplete primer extension (PIPE) method.8 The appropriate substitution and the absence of unwanted mutations were confirmed by sequencing the insert from both directions. Native, mutant (D56A), and selenomethionine-labeled proteins, expressed with an N-terminal tag (MGSDKI HHHHHH), were analyzed by LC-MS by coupling a reversed-phase Protein Captrap column (Michrom Bioresources) to a Q-Tof 2 mass spectrometer (Micromass) operating in V mode. The proteins were desalted on the column with 0.2% (v/v) formic acid and 4% (v/v) acetonitrile in water and then eluted with 0.2% (v/v) formic acid and 65% acetonitrile (v/v) in water. Deconvolution of the spectra using the MaxEnt1 algorithm was performed using MassLynx Software (Waters). Native and selenomethionine-labeled proteins were reduced with 10 mM dithiothreitol in 100 mM ammonium bicarbonate for 1 h at room temperature and then thioalkylated by the addition of 20 mM iodoacetamide for 30 min in the dark. The proteins were then digested overnight at 37°C by adding trypsin at a 1:40 mass ratio (trypsin/protein). Native protein was further digested for 8 h at 37°C with endoproteinase Glu-C at a mass ratio of 1:40 (protease/protein). The digested proteins were analyzed by LC-MS/MS by interfacing a reversed-phase C18-packed capillary column to a Q-Tof Ultima mass spectrometer (Micromass) operating in data-dependent MS/MS switching mode. Multi-wavelength anomalous diffraction (MAD) data were collected at the Advanced Light Source (ALS, Berkeley, CA) on beamline 8.2.1 at wavelengths corresponding to the inflection (λ1), low energy remote (λ2), and peak (λ3) of a selenium MAD experiment. The data sets were collected at 100 K using an ADSC Q210 CCD detector. The MAD data were integrated and reduced using Mosflm9 and then scaled with the program SCALA from the CCP4 suite.6 Data statistics are summarized in Table I. Phasing was performed with SHELXD10 and autoSHARP,11 and automated model building was performed with ARP/wARP12 and RESOLVE.13 Model completion and refinement were performed with Xfit14 and REFMAC5.15 Refinement statistics are summarized in Table I. Analysis of the stereochemical quality of the model was accomplished using AutoDepInputTool,16 MolProbity,17 SFcheck 4.0,6 and WHATIF 5.0.18 Protein quaternary structure analysis was performed using the PQS (Protein Quaternary Structure) server,19 the PISA (Protein Interfaces, Surfaces and Assemblies) server,20 and PITA (Protein InTerfaces and Assemblies) software.21 Figure 1(B) was adapted from an analysis using PDBsum.22 Figures 1(A) and 2(B,D) were prepared with PyMOL (DeLano Scientific) and ProtSkin (http://www.mcgnmr.ca/ProtSkin/). Sequence conservation in Figure 2(C) was calculated using rate4site.23 Atomic coordinates and experimental structure factors for TM1506 from T. maritima at 2.70 Å resolution have been deposited in the PDB and are accessible under the code 1vk9. Crystal structure of TM1506 from Thermotoga maritima. (A) Stereo ribbon diagram of the TM1506 monomer color-coded from N-terminus (blue) to C-terminus (red). Helices H1–H6 and β-strands β1–β5 are indicated. (B) Diagram showing the secondary structural elements of TM1506 superimposed on its primary sequence. The α-helices, β-strands, and β-turns are indicated. The β-sheet is indicated by a red A and the β-hairpin is depicted as a red loop. Dashed lines indicate regions that are not included in the protein structure. ADP-ribosylation reaction, model of TM1506 with conserved residues and ADP-ribose shown on its surface, and sequence alignment. (A) Proposed reaction from NAD+ to ADP-ribosylated Asp56. (B) Residues conserved in TM1506 are shown on its surface, where the intensity of red color indicates the degree of conservation. The ADP-ribose (shown as a stick model colored by atom type: carbon, green; nitrogen, blue; oxygen, red; phosphorus, orange) was manually fitted into the extra electron density in the active site; metals and waters are shown as spheres (zinc, blue; magnesium, green; water, red). (C) Multiple sequence alignment of TM1506 and its homologs. The intensity of red color indicates the degree of sequence conservation. The conservation scores (0, not conserved; 4, highly conserved) were derived from an analysis using rate4site software using all available homologs of TM1506 and not just those shown in the alignment. The positions of the most conserved residues in TM1506 are indicated above the alignment. Homologs of TM1506 used in the alignment include: ZP_01189901, conserved hypothetical protein from Halothermothrix orenii H 168; YP_001244876, hypothetical protein Tpet_1286 from Thermotoga petrophila RKU-1; ZP_01354436, putative TonB-dependent outer membrane receptor from Clostridium phytofermentans ISDg; YP_001319085, domain of unknown function DUF1893 from Alkaliphilus metalliredigenes QYMF; YP_101571, putative TonB-dependent outer membrane receptor from Bacteroides fragilis YCH46; NP_809874, hypothetical protein BT0961 from Bacteroides thetaiotaomicron VPI-5482. (D) The site of ADP-ribosylation. A plausible model of ADP-ribose that fits the density well is shown as a stick model (carbon, grey; nitrogen, blue; oxygen, red; phosphorus, orange); metals and waters are shown as spheres (zinc, blue; magnesium, green; water, red). The experimental map is shown after solvent flattening. Side-chain atoms of Asp56 and Cys113 are shown as sticks (carbon, yellow; oxygen, red). The crystal structure of TM1506 was determined to 2.70 Å resolution using the MAD method (see Fig. 1). Data collection, model, and refinement statistics are summarized in Table I. The final model includes one monomer (residues 1–136 plus the entire N-terminal expression and purification tag except for the initial Met residue, whose absence was confirmed by LC-MS), six zinc ions, an unknown ligand (UNL), and nine water molecules in the asymmetric unit. It is assumed that the Met from the tag was removed by an endogenous methionine aminopeptidase. No electron density was observed for residues 137–139 or the side chains of Glu2, Lys3, Arg7, Lys37, Lys107, Lys110, Glu126, and Glu127. The Matthews' coefficient (Vm)24 for TM1506 is 4.9 Å3/Da, and the estimated solvent content is 74.9%. The Ramachandran plot produced by MolProbity25 shows that 99.3% of the residues are in favored regions with no outliers. TM1506 is composed of five β-strands (β1–β5) and six α-helices (Lys3–Lys15, Lys37–Arg45, Lys61–Met70, Lys82–Glu90, Pro114–Leu120, and Pro125–Leu133; Fig. 1). The total β-sheet and α-helical content is 21.1% and 38.8%, respectively. The TM1506 monomer comprises a single domain and its secondary structural elements are arranged in a three-layer core (α/β/α) with a mixed β-sheet of five strands (order: 21345), where strand 1 is antiparallel to the rest. As per the SCOP classification scheme,26 TM1506 belongs to the cytidine deaminase-like fold and is classified as a member of the cytidine deaminase-like SCOP superfamily (sunid: 53927). A DALI27 structural similarity search of TM1506 found several hits to proteins with a cytidine deaminase-like fold (PDB codes 1uaq, 1g8m, 2b3j, 2b3z, 2g84, 1teo, and 1tiy) with significant Z-scores (8.5, 8.5, 7.3, 7.3, 6.9, 6.1, and 5.2, respectively), but residues important for zinc binding and catalysis in those enzymes are not conserved in TM1506. In addition, the position of the C-terminal helices (H5 and H6) in TM1506 occludes the typical dimerization interface that, in many cases, is functionally important for proteins with a cytidine deaminase-like fold.28, 29 The structurally similar (DALI Z-score = 5.8) T4 bacteriophage 2′-deoxycytidylate deaminase (PDB code 1vq2) was crystallized with an inhibitor located in a deep grove that is not present in TM1506. Comparison of TM1506 with tRNA adenosine deaminase TadA (PDB code 2b3j, crystallized with tRNA) indicates that TM1506 may be a tRNA-binding protein, consistent with the finding that several nucleotide-binding proteins are targeted for ADP-ribosylation.30 Analytical size-exclusion chromatography coupled with static light scattering suggests a monomer as the likely oligomeric state of TM1506, while crystallographic packing analysis using the PQS server predicts a homotetramer. This prediction is likely artifactual because of the presence of the expression/purification tag, which makes up a significant part of the tetramerization interface. Analysis of the TM1506 structure with the tag removed (as well as the associated zinc atoms) using the PISA server and PITA software suggests a monomer. The actual observed mass of the native protein was 17,576.6 Da, as determined by LC-MS, and corresponded to a delta mass of +540.7 Da based on the assumption that the N-terminal methionine from the tag was removed by an endogenous methionine aminopeptidase. Unmodified protein lacking this methionine has a calculated average mass of 17,035.9 Da. Analysis of selenomethionine-labeled protein revealed a mass of 17,811.6 Da, which also corresponded to a delta mass of +540.7 Da. Protein devoid of the methionine from the tag and having all five methionine residues substituted with selenomethionine has a theoretical average mass of 17,270.9 Da. LC-MS data suggest that TM1506 is covalently modified with ADP-ribose, since a protein with this modification would have an additional mass of 541 Da. The ADP-ribosylated peptides 45-RFDNLEGSLVIDK-57 and 46-FDNLEGSLVIDK-57 were identified by LC-MS/MS analysis of tryptic digests of native and selenomethionine-labeled proteins. Several tandem MS spectra revealed intense signal at m/z 136 ([adenine+H]1+), m/z 250 ([adenosine+H-H2O]1+), m/z 348 ([AMP+H]1+), and m/z 428 ([ADP+H]1+), indicative of fragmentation of the covalently bound ADP-ribose. The ADP-ribosylated peptide 51-GSLVIDK-57 was detected by analysis of the tryptic/endoproteinase Glu-C dual digest of native protein. The tandem MS spectrum suggested that Asp56 was modified, given that the C-terminal backbone fragments (y ions) y2–y5 were found ADP-ribosylated or as fragments that had undergone the neutral loss of AMP from the ADP-ribose moiety. Furthermore, as determined by LC-MS, a D56A mutant of TM1506 lacked the additional mass of 541 Da, supporting our hypothesis that Asp56 is the site of modification. An analysis of the TM1506 structure using the CASTp server31 revealed a deep cavity of 366 Å3 formed by Ser19, Leu20, Asp33, Ser34, Gly35, Leu36, Pro38, Val39, Asp56, Lys57, Met58, Gly60, Ala62, Val104, Cys113, Phe115, and Glu116. Location of electron density in this cavity corresponds to the presence of a ligand covalently bound to the conserved Asp56 (see Fig. 2). As indicated earlier, mass spectrometry revealed that the protein is ADP-ribosylated and suggested that the invariant Asp56 is the site of modification. As expected, the electron density closely approximates an ADP-ribose. However, the adenosine moiety, which is exposed to solvent, is disordered and appears to be partly displaced by a Zn2+ ion that is coordinated to Glu99. The Zn2+ ion was confirmed by anomalous difference Fourier. Because of the limited resolution, this ambiguity cannot be fully resolved. As a result, the ligand was assigned as a UNL in the coordinates deposited in the PDB. Figure 2(A) illustrates the proposed reaction32 leading from NAD+ to ADP-ribosylated Asp56 and a possible conformation of ADP-ribose (manually docked into the extra density). Figure 2(D) illustrates one model where ADP-ribose fits reasonably well into the density. Additionally, two metals are likely to be located in the pocket to provide stabilization for the phosphoryl groups. Based on anomalous difference maps, a heavy metal (possibly Zn2+) is likely located near Cys113. A second, lighter metal, such as Mg2+, is probably adjacent to Ser34. The information presented here, in combination with further biochemical and biophysical studies, should (1) yield valuable insights into the functional role of this protein, (2) help to determine whether NAD+ or its derivatives can bind in the conserved cleft, (3) reveal the function of ADP-ribosylation (ADP-ribosylation reaction has been proposed as a potential new drug target33), (4) confirm tRNA binding, and (5) elucidate if a high proportion of basic residues is important for the function of TM1506. The JCSG has developed The Open Protein Structure Annotation Network (TOPSAN), a wiki-based community project to collect, share, and distribute information about protein structures determined at PSI centers. TOPSAN offers a combination of automatically generated, as well as comprehensive, expert-curated annotations, provided by JCSG personnel and members of the research community. Additional information about TM1506 is available at https://www.topsan.org/1VK9. Portions of this research were performed at the Advanced Light Source (ALS) and Stanford Synchrotron Radiation Laboratory (SSRL). The SSRL is a national user facility operated by Stanford University on behalf of the United States Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences).