Enhancing cryo-EM structure prediction with DeepTracer and AlphaFold2 integration

Jason Chen,Ayisha Zia,Albert Luo,Hanze Meng,Fengbin Wang,Jie Hou,Renzhi Cao,Dong Si
DOI: https://doi.org/10.1093/bib/bbae118
IF: 9.5
2024-04-25
Briefings in Bioinformatics
Abstract:Understanding the protein structures is invaluable in various biomedical applications, such as vaccine development. Protein structure model building from experimental electron density maps is a time-consuming and labor-intensive task. To address the challenge, machine learning approaches have been proposed to automate this process. Currently, the majority of the experimental maps in the database lack atomic resolution features, making it challenging for machine learning-based methods to precisely determine protein structures from cryogenic electron microscopy density maps. On the other hand, protein structure prediction methods, such as AlphaFold2, leverage evolutionary information from protein sequences and have recently achieved groundbreaking accuracy. However, these methods often require manual refinement, which is labor intensive and time consuming. In this study, we present DeepTracer-Refine, an automated method that refines AlphaFold predicted structures by aligning them to DeepTracers modeled structure. Our method was evaluated on 39 multi-domain proteins and we improved the average residue coverage from 78.2 to 90.0% and average local Distance Difference Test score from 0.67 to 0.71. We also compared DeepTracer-Refine with Phenixs AlphaFold refinement and demonstrated that our method not only performs better when the initial AlphaFold model is less precise but also surpasses Phenix in run-time performance.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to improve the accuracy of cryo - electron microscopy (cryo - EM) structure prediction by combining the deep learning methods DeepTracer and AlphaFold2. Specifically, the paper proposes a new method named DeepTracer - Refine, aiming to address the following two main challenges: 1. **Limitations of the density - map - based modeling method (Map - to - Model)**: - **Resolution limitations**: Most cryo - EM density maps obtained from experiments lack atomic - level resolution features, making it difficult for machine - learning methods to accurately determine protein structures. - **Residue identification and connection errors**: Due to the limited resolution of cryo - EM density maps, machine - learning algorithms have difficulty in accurately classifying side chains, resulting in inaccurate prediction of amino acid types. In addition, due to the noise and quality differences in experimental maps, methods such as DeepTracer may have incorrect connections when connecting residues. 2. **Limitations of the sequence - based modeling method (Sequence - to - Model)**: - **Folding accuracy in cross - domain regions**: Although AlphaFold2 has made breakthrough progress in predicting protein structures, it still has accuracy problems when folding the regions between protein domains. These regions are usually flexible and disordered, so AlphaFold2's predictions in these regions are not precise enough. To overcome these challenges, the paper proposes the DeepTracer - Refine method to improve the prediction results of AlphaFold2 through the following steps: - **Utilizing the high - sequence - coverage of AlphaFold2**: AlphaFold2 can provide complete sequence coverage, but the predicted backbone geometry may be inaccurate. - **Combining DeepTracer's high - resolution density - map analysis**: DeepTracer can identify residues from high - resolution cryo - EM density maps and generate preliminary structure predictions. - **Automatic segmentation and alignment**: DeepTracer - Refine detects low - confidence regions in AlphaFold2 predictions, segments the protein structure into compact domains, and aligns these domains with DeepTracer's prediction results to improve the overall structure's accuracy. Through this method, DeepTracer - Refine significantly improves the residue coverage and local distance difference test (lDDT) score in the structure prediction of multi - domain proteins, thus solving the above - mentioned challenges.