A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes

Kui Xu,Zhe Wang,Jiangping Shi,Hongsheng Li,Qiangfeng Cliff Zhang
DOI: https://doi.org/10.48550/arXiv.1901.00785
2019-02-12
Abstract:Constructing of molecular structural models from Cryo-Electron Microscopy (Cryo-EM) density volumes is the critical last step of structure determination by Cryo-EM technologies. Methods have evolved from manual construction by structural biologists to perform 6D translation-rotation searching, which is extremely compute-intensive. In this paper, we propose a learning-based method and formulate this problem as a vision-inspired 3D detection and pose estimation task. We develop a deep learning framework for amino acid determination in a 3D Cryo-EM density volume. We also design a sequence-guided Monte Carlo Tree Search (MCTS) to thread over the candidate amino acids to form the molecular structure. This framework achieves 91% coverage on our newly proposed dataset and takes only a few minutes for a typical structure with a thousand amino acids. Our method is hundreds of times faster and several times more accurate than existing automated solutions without any human intervention.
Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
This paper attempts to solve the problem of constructing molecular structure models from cryo - electron microscopy (Cryo - EM) density volume data. Specifically, the authors propose a deep - learning - based method, redefining this problem as a visually - inspired 3D detection and pose - estimation task. They develop a deep - learning framework for identifying amino acids in 3D Cryo - EM density volumes and design a sequence - guided Monte Carlo Tree Search (MCTS) algorithm to connect candidate amino acids to form molecular structures. This method aims to increase the degree of automation, reduce human intervention, and improve processing speed and accuracy at the same time. ### Problems Solved by the Paper 1. **Automated Molecular Structure Modeling**: Traditional molecular structure modeling depends on the manual operation of structural biologists, which is both time - consuming and error - prone. The method proposed in the paper aims to fully automate this process without human intervention. 2. **Improving Processing Speed and Accuracy**: Existing automated solutions can reduce manual operations, but often require hundreds of hours of computing time, and have limited accuracy and coverage. The method proposed in the paper is hundreds of times faster than existing methods and has higher accuracy. 3. **3D Detection and Pose Estimation**: Decompose the problem of determining molecular structures into three sub - problems: - Amino Acid Detection: Detect the positions of amino acids in the density volume. - Atomic Coordinate Assignment: Determine the atomic coordinates of each amino acid. - Backbone Threading: Resolve the order of amino acids forming each protein chain. ### Method Overview 1. **3D Detection Network**: Use a 3D Convolutional Neural Network (CNN) for amino acid detection, and introduce the Aspect - Ratio Preserved RoI (APRoI) layer to preserve the aspect ratios of different amino acids, thereby improving detection performance. 2. **Pose - Estimation Network**: Use a 3D Stacked Hourglass Network to regress the 3D coordinates of atoms in each amino acid. 3. **Sequence - guided MCTS Algorithm**: Search among candidate amino acids and connect them to form a complete molecular structure through the MCTS algorithm, and use the Peptide Bond Recognition Network (PBNet) for tree pruning to improve search efficiency. ### Dataset The authors construct a large - scale A2 dataset, which contains 250,000 amino acid objects, distributed in 1,713 protein chains from 218 structures. This dataset is the first large - scale, richly - annotated dataset for automatic molecular structure determination. ### Experimental Results 1. **Detection Performance**: On the A2 dataset, the proposed A2 - Net achieves 0.891 in the mAP (mean Average Precision) metric, significantly outperforming other 3D object detection methods. 2. **Backbone Threading Performance**: Compared with the traditional Depth - First Search (DFS) method, the MCTS + PBNet method performs better in terms of coverage and root - mean - square deviation (RMSD), reaching 0.91 and 2.0 respectively. 3. **Comparison with Rosetta - denovo**: In actual protein structure modeling tasks, the method of A2 - Net significantly outperforms Rosetta - denovo in terms of coverage and processing time. ### Summary This paper proposes a fully - automatic deep - learning - based molecular structure modeling method. Through 3D detection, pose - estimation, and sequence - guided MCTS algorithms, it significantly improves processing speed and accuracy and reduces the dependence on human intervention.