Crystal structure of a novel non-Pfam protein AF 1514 from Archeoglobus fulgidus DSM 4304 solved by S-SAD using a Cr X-ray source
Yang Li.,Pazilat Bahti,Neil Shaw,Gaojie Song,Shunmei Chen,Xuejun Zhang,Min Zhang,Chongyun Cheng,Jie Yin,Jin-Yi Zhu,Hua Zhang,Dongsheng Che,Hao Xu,Abdulla Abbas,Bi-Cheng Wang,Zhi-Jie Liu
Abstract:Many computational tools have been developed recently to accurately predict the structure of a protein from its amino acid sequence.1-3 In general, if a query protein sequence shares at least 30% sequence identity with a protein sequence whose 3D structure has been determined, the structure of this query sequence can be modeled based on the template structure, using MODELLER software for example.3, 4 Computational software, however, cannot guarantee accurate prediction for those new proteins that share low sequence similarity in PDB. Therefore, experimental methods such as X-ray crystallography and nuclear magnetic resonance (NMR) are still the main approaches for a protein structural study.5, 6 The current target selection strategy of most structural genomics centers7-9 mainly focuses on the representatives of manually curated protein families (Pfam),10-12 that is, the selected protein sequence shares at least one conserved domain with other members within a family. In this way, the solved representative structures can be used as structural templates to predict structures of the remaining protein sequences in the same family using computational tools. It has been shown that this “Pfam” target selection strategy increases not only the number of novel structures, but also the number of new folds.13 However, over-emphasis of Pfam and ignoring non-Pfam sequences (i.e., not sharing any conserved domain in Pfam) in target selection might lead to biased distribution of Pfam and non-Pfam structures in PDB, and possibly slow the growth rate of new structures and folds. Our analysis on 150 microbial genomes showed that non-Pfam sequences account for 25–30% of all Open Reading Frames (ORFs) for most genomes, and some could reach to 60% (unpublished data). The high percentage of non-Pfams over all ORFs reminds us non-Pfams should not be neglected while devising a target selection strategy. On the other hand, these non-Pfam sequences for each genome are either paralogous non-Pfam (in which sequences have homologous partners within the same organism), or orthologous non-Pfam (in which sequences have orthologous partners in the closely related organisms), or singleton non-Pfam (in which sequences have only one copy in the organism). These three non-Pfam kinds are either organism-specific or genus-specific, implying the possible existence of undiscovered unique features of non-Pfams, such as unique SCOP fold or CATH topology. Many Pfam sequences have significant biological meaning, while the functions of most non-Pfam sequences are unknown to date. However, this does not mean that non-Pfam sequences are biologically less meaningful. Some non-Pfam sequences and structures are predicted to play important roles for the uniqueness of these organisms. Therefore, a non-Pfam selection strategy will not only accelerate the expansion of SCOP fold space, but also help biologists understand the functions of these proteins based on structures. The present work describes the structure of AF1514, a non-Pfam protein from Archeoglobus fulgidus with unknown function, solved at 1.8 Å resolution by using anomalous signal of sulfur generated by chromium X rays (wavelength = 2.29 Å). E coli BL21 was freshly transformed with plasmid containing AF1514 gene. Cells were grown at 37°C until culture density reached OD600 nm − 0.8. The culture was cooled down to 12°C and induced with 0.2 mM IPTG for 40 h. Cells were harvested by centrifugation and lysed by sonication. Cell debris was removed by centrifugation and the clarified supernatant subjected to Ni-affinity chromatography. The protein was further purified using size exclusion chromatography. The purified AF1514 protein was divided into two equal aliquots. One aliquot was methylated as described previously,14-16 while the other aliquot was directly concentrated without any chemical modification. Both, methylated and non-methylated protein samples were concentrated to ∼18 mg mL−1 in 20 mM Tris-HCl, pH 8.0, 200 mM NaCl, and 1 mM DTT before setting up crystallization drops. Crystallization screening was carried out in hanging drop vapor diffusion method using TTP Lab Tech mosquito robot. Commercially available sparse matrix screens (Hampton Research - Crystal Screen 1 and 2, Index and PEG/Ion Screen and Emerald Biosystems' Wizard I and II) were used to screen crystallization space. Crystallization optimization was carried out in 2 μL hanging drops containing 1 μL protein mixed with 1 μL mother liquor. The drops were equilibrated over 300 μL reservoir solution and incubated at 16°C. Tetragonal crystals appeared in 5 days for both non-methylated and methylated proteins in a crystallization solution containing 0.1M sodium acetate pH 5.0, 0.1M sodium chloride, 10% (w/v) MPD. Crystals were flash frozen in liquid nitrogen prior to mounting and data were collected at cryogenic temperature (100 K). The diffraction quality of the non-methylated protein crystals mounted directly from the mother liquor was poor and could diffract X rays to only 3.5 Å. The resolution improved after optimizing the cryoprotectant concentration and the best crystal diffracted X rays to 2.4 Å. The sulfur anomalous diffraction data for the non-methylated protein crystal was collected using a chromium rotating anode X-ray source and R-AXIS IV++ detector (Rigaku) with 102 mm crystal to detector distance and 240 s exposure time per image. To improve the signal-to-noise ratio of the anomalous signal of sulfur, the crystal was scanned 2 × 360° and a 2.4 Å resolution dataset for the non-methylated AF1514 crystal was collected. The crystal for methylated AF1514 protein was obtained under the same crystallization condition as the non-methylated protein. Methylated protein crystal dataset consisting of a single axis ϕ scan with 308 half-a-degree oscillation images was recorded on a cupper rotating anode X-ray source and R-AXIS IV++ detector (Rigaku) using a crystal-to-detector distance of 160 mm and 240 s exposure time per image. A 1.8 Å resolution dataset for the methylated AF1514 crystal was collected. Both methylated and non-methylated data sets were indexed, integrated and scaled using HKL2000.17 The crystals belong to space group P41212 (identified during structure determination) with unit-cell parameters a = 49.24 Å, b = 49.24 Å, c = 106.46 Å. The data collection statistics are listed in Table I. The structure was solved by the sulfur SAD method.18 The anomalous signal of sulfur atoms from two cysteine residues and a methionine was located by SHELXD19 using a 2.44 Å resolution dataset collected in-house using a chromium X-ray source. Initial phasing was done using program Sharp.20 Most of the polypeptide backbone could be traced automatically by Arp/Warp.21 Minor revisions to the model were done manually using COOT.22 Refinement was carried out using REFMAC23 against a higher resolution dataset of methylated protein crystal collected using copper X-ray source. The refinement converged to give thestatistics presented in Table I. The final model was validated using MolProbity24 and PROCHECK25 prior to submission to the Protein Data Bank.26 A Wu-Blast search of PDB for structural homologues failed to retrieve any structure similar to AF1514 (E value > 0.50). The protein has two methionine (Met1 and Met4) and three cysteine residues (Cys53, Cys54, and Cys64) that could be used for sulfur phasing. Met1 and Cys54 were disordered and could not be used for the phasing. Anomalous signal of sulfurs from two cysteine residues (Cys53 and Cys64) and a methionine (Met4) was used for determination of the phases. A longer wavelength chromium X-ray source was used to collect a 2.44 Å dataset. An initial experimental electron density map calculated at 2.5 Å was of very good quality and more than 90% of the residues could be fit in automatically using ARP/wARP. The final model refined to 1.8 Å resolution had an R value of 20.5% (R free 22.6%). The overall geometry of the model was excellent with no residues lying in the disallowed region of the Ramchandran plot. The asymmetric unit consists of one monomer of the protein based on the calculated solvent content of 62.0%. However, the size exclusion chromatography profile of AF1514 suggested that the protein could exist as a dimer. Further analysis of the region around the asymmetric unit revealed that two molecules of the protein associate to form a homodimer. Interestingly, the two molecules in the homodimer are related by a crystallographic symmetry twofold axis. There are eight hydrogen bonds between the two beta sheets of the two adjacent molecules with the secondary structural elements running anti-parallel to each other. There are 167 water molecules in the final model. Electron density for amino acid residues 3–87 was clearly visible. The overall structure consists of 2 α helices, 5 β sheets, and 8 loops [Fig. 1(A)]. Two monomers of AF1514 sit side by side with the secondary structural elements of one monomer running anti parallel to the other within the dimer [Fig. 1(B)]. Surface electrostatic potential map of the protein showed an uneven distribution of charge on the protein surface [Fig. 1(C)]. CATH server27 classified the structure as mixed alpha beta with a topology similar to thiol ester dehydrase. Superimposition of the structure of AF1514 over a thiol ester hydrolase from Arthrobacter (PDB code 1Q4S) showed 57 of the 85 main chain carbons overlapped with an RMSD of 2.7 Å. The secondary structural elements of AF1514 are arranged in a “hot dog” fold similar to the thiol ester hydrolase.28 While the quaternary structure of the Arthrobacter thiol esterase represents a tetramer, the AF1514 protein exists as a dimer. The substrate is seen sitting in a wedge between the two subunits of the dimer pair.28 An equivalent binding site for the substrate in AF1514 could not be identified successfully by superimposing the structure of the Arthrobacter thiol esterase over AF1514. AF1514 therefore is less likely to function as a thiol esterase. DALI analysis of AF1514 structure failed to identify any structural neighbor of known function with significant similarity (Z score > 2.5). Overall structure of AF1514. A: A cartoon representation of the AF1514 structure. AF1514 is made up of two helices, five beta sheets and numerous loops. B: A homodimer of AF1514 showing two monomers sitting next to each other with the secondary structural elements running anti parallel. C: A surface electrostatic potential representation of the AF1514 structure. Positive potential is colored blue, negative potentials are colored red. D: Primary sequence of AF1514 annotated with secondary structural elements. Residues highlighted in green contributed the anomalous signal of sulfur. In future, as more structures with known function are deposited in PDB, clues about the function of AF1514 based on structure could be obtained. Since no homologues with known function could be identified based on primary sequence and structural similarity, it is quite likely AF1514 may be carrying out a unique function. Further functional studies are required in order to determine the exact role of AF1514 in Archeoglobus fulgidus. Accession numbers: Atomic co-ordinates and structure factors for the AF1514 structure have been deposited in the Protein Data Bank (accession code 3C0F).