Unification of the ferritin family of proteins ( iron metabolism / heme protein / iron storage / aerobic organisms / AIDS )
M. Grossman,S. Hinton,V. Minak‐Bernero,C. SLAUGHTERtt,E. Stiefel
Abstract:Ferritin is the iron-storage protein of eukaryotic organisms. The nucleotide sequence encoding Azotobacter vinelandii bacterioferritin, a hemoprotein, was determined. The deduced amino acid sequence reveals a high degree of identity with Escherichia coli bacterioferritin and a striking similarity to eukaryotic ferritins. Moreover, derivation of a global alignment shows that virtually all key residues specifying the unique structural motifs of eukaryotic ferritin are conserved or conservatively substituted in the A. vinelandii sequence. The alignment suggests specific methionine residues as heme-binding ligands in bacterioferritins. The overall sequence similarity with conservation of key structural residues implies that all ferritins form a unified family of proteins. The results implicate ferritins as proteins potentially common to all aerobic organisms and as such useful in taxonomic classification, evolutionary analysis, and environmental monitoring. Ferritins are proteins involved in the safe storage and timely delivery of iron for biosynthesis in eukaryotic organisms (1-6). Eukaryotic ferritins contain 24 subunits, each of Mr -'18,000, which define a rhombic dodecahedral protein shell that encloses up to 4000 iron atoms in an oxide/hydroxide/ phosphate core. Bacterioferritin (Bfr) (7) was first identified in Azotobacter vinelandii as cytochrome b557.5 (7, 8) and resembles eukaryotic ferritins in molecular weight, subunit weight, amino acid composition, isoelectric point, size, and shape and contains a core resembling that in ferritins (7). Similar Bfrs have been isolated from Escherichia coli (9), Pseudomonas aeruginosa (10), and Nitrobacter winogradskyi (11). However, unlike eukaryotic ferritins, all Bfrs, as isolated, contain heme to the maximal extent of one per two subunits (8, 12). Recently, the amino acid sequence of E. coli Bfr was used to predict a highly helical secondary structure similar to eukaryotic ferritins (13-15). Moreover, a preliminary crystallographic study (16) and electron microscopic imaging (8) suggest that Bfrs have a ferritin-like structure. However, despite this similarity, the primary structure of the E. coli protein was interpreted as indicating a lack of homology between the eukaryotic and prokaryotic ferritins (13-15), and the structural and functional similarity of the two classes of proteins was attributed to convergent evolution. Here we report the nucleotide sequence¶ encoding A. vinelandii and its comparison with eukaryotic ferritins and the complete Protein Identification Resource (PIR) data base. EXPERIMENTAL PROCEDURES Protein Purification and Sequencing. A. vinelandii Bfr was purified as described by Lough et al. (17) except that prior to crystallization the protein solution was buffer exchanged by passage through Sephadex G-25 (Pharmacia) equilibrated with crystallization buffer and concentrated to 40 mg of protein per ml. Purified protein was subjected to automated N-terminal amino acid sequence analysis (18). DNA Manipulation. DNA purification, manipulation, cloning, and subcloning were performed by using standard techniques (19). The A. vinelandii OP bfr gene was cloned as a 10-kilobase Sau3A chromosomal DNA fragment into the unique BamHI site of AEMBL3. The clone containing A. vinelandii bfr was identified by plaque hybridization with a 32P end-labeled 42-nucleotide probe, based on the first 14 amino acids of the N-terminal protein sequence and A. vinelandii codon preference. For DNA sequencing (20), the A. vinelandii bfr gene was subcloned as a 2.2-kilobase BamHI restriction fragment into the BamHI site of pBluescript II KS+ (Stratagene) to produce pABF11. The A. vinelandii bfr sequence was identified with a short version of the 42-nucleotide probe. A complete overlapping sequence was determined for both DNA strands by using primers based on the newly sequenced DNA. RESULTS AND DISCUSSION Sequence of the Gene forA. vinelandii Bfr. TheA. vinelandii bfr DNA and deduced protein sequence is shown in Fig. 1. The encoded protein sequence corresponds exactly with the N-terminal amino acid sequence of the protein. The 5' noncoding region and the first five codons of the coding region could form a stem-loop structure with similarities to the highly conserved iron-responsive element involved in the translational regulation of eukaryotic ferritin synthesis (4, 21). The loop of the stem-loop structure contains the putative ribosomal binding site (Shine-Dalgarno sequence) of Bfr mRNA, suggesting that protein binding at this location could interfere with translation. Two sequences in the coding region, nucleotides 505-523 and 562-580 (Fig. 1), have 54% and 59% identity, respectively, to the binding site consensus sequence for the prokaryotic iron-responsive regulatory protein Fur (4, 22). Sequence Analysis. The deduced protein sequence for A. vinelandii Bfr shows 68% identity with E. coli Bfr (13, 14), clearly indicating homology. Although sequence similarity between Bfr and eukaryotic ferritins was suggested based on amino acid composition (8, 23), subsequent analysis of the E. coli Bfr amino acid sequence was interpreted as indicating lack of homology between the eukaryotic and prokaryotic ferritins (13-15). Comparison of the A. vinelandii Bfr sequence with the entire PIR protein data base (>26,000 sequences; release no. 27) was performed using the FASTDB program provided by Abbreviations: Bfr, bacterioferritin; PIR, Protein Identification Resource; PAM, point acceptable mutations. tPresent address: Howard Hughes Medical Institute, University of Texas, 5323 Harry Hines Boulevard, Dallas, TX 75235-9050. §To whom reprint requests should be addressed. $The sequence reported in this paper has been deposited in the GenBank data base (accession no. M83692) 2419 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. 2420 Biochemistry: Grossman et al. CAGCCGCTCCAAACAGTCGCACTAGAGACTTATTCTCTATTAGACTCAATCACTTAGCTT 60 -35 -10 --------------------> RBS <TGCTATCCAGTCCMMGACCAAACTGCCTCATCGACATCCCTATTCAAATG 120 GACATGAAAGGCGATAAGATAGTCATCCAACACCTCAACAAGATCCTCGGTAACGAGTTG 180 MetLysGlyAspLysIleVal IleGlnHisLeuAsnLysIleLeuGlyAsnGluLeu ATCGCGATCAACCAGTACTTCCTACATGCACGCATGTATGAAGACTGGGGGCTGGAGAAA 240 IleAlalleAsnGlnTyrPheLeuHisAlaArgMetTyrGluAspTrpGlyLeuGluLys CTCGGCAAGCATGAGTATCACGAATCCATCGATGAGATGAAGCATGCCGACAAATTGATC 300 LeuGlyLysHi sGl uTyrHi sGl uSerIl eAspGl uMetLysHi sAl aAspLysLeuIl e AAGCGTATTCTGTTTCTCGAGGGCCTGCCCAACCTCCAGGAGCTCGGCAAGCTTCTCATC 360 LysArgIleLeuPheLeuGl uGlyLeuProAsnLeuGl nGl uLeuGlyLysLeuLeulle GGTGAACACACTAAGGAAATGCTCGAGTGTGATCTGAAACTTGAGCAAGCAGGGTTGCCC 420 GlyGl uHi sThrLysGl uMetLeuGl uCysAspLeuLysLeuGl uGl nAl aGlyLeuPro GACCTGAAGGCCGCCATTGCCTACTGCGAAAGCGTTGGGGACTATGCCAGCCGCGAATTG 480 AspLeuLysAl aAl all eAl aTyrCysGI uSerVal GlyAspTyrAl aSerArgGl uLeu Fur CTAGMGACATCCTTGMTCCGAAGAAGACCATCGTGTGGAAACCCAGCTGGAC 540 LeuGluAspIleLeuGluSerGluGluAspHisIleAspTrpLeuGluThrGlnLeuAsp Fur TTGATCGATAAGATCGGCCTGGMMTTATCTGCMTCGCAAATGGATGAGTMGCGGCA 600 LeulleAspLysIleGlyLeuGluAsnTyrLeuGl nSerGl nMetAspGl uTer RBS GGAGCCGCAATGGCACCACCGAGAACGAGCACAAGC MetAl aProProArgThrSerThrSer FIG. 1. Nucleotide and derived amino acid sequence of A. vinelandii Bfr (nucleotides 124-591). A putative Shine-Dalgarno sequence designating a ribosomal binding site (RBS) is underlined, a possible loop structure overlapping the ribosomal binding site and start codon is designated by dashed arrows, possible Fur recognition sites are underlined, and a second undefined open reading frame starts at nucleotide 609. IntelliGenetics (24). In addition to a unitary (identity) matrix, we employed the widely used point acceptable mutations (PAM) matrix developed by Dayhoff and coworkers (25) and the structure-genetic matrix of McLachlan (26). Using a unitary matrix, we found sequence identity between A. vinelandii Bfr and eukaryotic ferritins to range from 29% with rat L to 24% with rat H ferritin. (H and L refer to the two types of ferritin subunits found in eukaryotes.) This alone suggests that these proteins may be related; however, identity levels of 20o or more can be observed between unrelated sequences as a result of fortuitous alignment based strictly on similarity in amino acid composition. When the A. vinelandii Bfr sequence was compared against the complete PIR data base, regardless of the matrix employed, more than half of the full-length eukaryotic ferritin sequences ranked within the top 100 scores (Table 1). We then performed a parallel set of sequence comparisons against the PIR data base with a randomized copy of A. vinelandii Bfr (Table 1). With all matrices but the PAM 150 and 250 matrices, which score nonidentical matches highly, no ferritin sequences were retrieved in the top 100 scores by using the randomized sequence. The randomization test strongly implies that the high scores obtained for these proteins are not fortuitous but are due to informational content common to all ferritin primary sequences. Other proteins scored highly when A. vinelandii Bfr was compared against the protein data base. As an example we discuss the myosins and tropomyosins, which frequently ranked within the top scoring proteins, often scoring higher than ferritin sequences. To assess the relatedness of myosins to A. vinelandii Bfr, we redetermined the number of myosin family proteins that ranked within the top 100 scores when using a randomized copy of A. vinelandii Bfr (Table 1). Regardless of the matrix employed, the number of myosin sequences that ranked within the top 100 scores using the randomized sequence were largely equal to, and in some cases actually higher than, the number obtained using the nonrandomized sequence. The results implicate strict compositional relatedness as being responsible for the high rank and point to evolutionary dissimilarity betwe