Recognizing amino acid sidechains in a medium resolution cryo-electron density map
Dibyendu Mondal,Vipul Kumar,Tadej Satler,Rakesh Ramachandran,Daniel Saltzberg,Ilan Chemmama,Kala Bharath Pilla,Ignacia Echeverria,Benjamin M. Webb,Meghna Gupta,Kliment A Verba,Andrej Sali
DOI: https://doi.org/10.1101/2024.12.10.627859
2024-12-12
Abstract:Building an accurate atomic structure model of a protein into a cryo-electron microscopy (cryo-EM) map at worse than 3 angstrom resolution is difficult. To facilitate this task, we devised a method for assigning the amino acid residue sequence to the backbone fragments traced in an input cryo-EM map (EMSequenceFinder). EMSequenceFinder relies on a Bayesian scoring function for ranking 20 standard amino acid residue types at a given backbone position, based on the fit to a density map, map resolution, and secondary structure propensity. The fit to a density is quantified by a convolutional neural network that was trained on ~5.56 million amino acid residue densities extracted from cryo-EM maps at 3-10 angstrom resolution and corresponding atomic structure models deposited in the Electron Microscopy Data Bank (EMDB). We benchmarked EMSequenceFinder by predicting the sequences of 58,044 distinct ɑ-helix and β-strand fragments, given the fragment backbone coordinates fitted in their density maps. EMSequenceFinder identifies the correct sequence as the best-scoring sequence in 77.8% of these cases. We also assessed EMSequenceFinder on separate datasets of cryo-EM maps at resolutions from 4 to 6 angstrom. The accuracy of EMSequenceFinder (63.5%) was better than that of two tested state-of-the-art methods, including findMysequence (45%) and sequence_from_map in Phenix (12.9%). We further illustrate EMSequenceFinder by threading the SARS-CoV-2 NSP2 sequence into eight cryo-EM maps at resolutions from 3.7 to 7.0 angstrom. EMSequenceFinder is implemented in our open-source Integrative Modeling Platform (IMP) program. Thus, it is expected to be helpful for integrative structure modeling based on a cryo-EM map and other information, such as models of protein complex components and chemical crosslinks between them.
Biology