Abstract:Constructing of molecular structural models from Cryo-Electron Microscopy (Cryo-EM) density volumes is the critical last step of structure determination by Cryo-EM technologies. Methods have evolved from manual construction by structural biologists to perform 6D translation-rotation searching, which is extremely compute-intensive. In this paper, we propose a learning-based method and formulate this problem as a vision-inspired 3D detection and pose estimation task. We develop a deep learning framework for amino acid determination in a 3D Cryo-EM density volume. We also design a sequence-guided Monte Carlo Tree Search (MCTS) to thread over the candidate amino acids to form the molecular structure. This framework achieves 91% coverage on our newly proposed dataset and takes only a few minutes for a typical structure with a thousand amino acids. Our method is hundreds of times faster and several times more accurate than existing automated solutions without any human intervention.

What problem does this paper attempt to address?

This paper attempts to solve the problem of constructing molecular structure models from cryo - electron microscopy (Cryo - EM) density volume data. Specifically, the authors propose a deep - learning - based method, redefining this problem as a visually - inspired 3D detection and pose - estimation task. They develop a deep - learning framework for identifying amino acids in 3D Cryo - EM density volumes and design a sequence - guided Monte Carlo Tree Search (MCTS) algorithm to connect candidate amino acids to form molecular structures. This method aims to increase the degree of automation, reduce human intervention, and improve processing speed and accuracy at the same time. ### Problems Solved by the Paper 1. **Automated Molecular Structure Modeling**: Traditional molecular structure modeling depends on the manual operation of structural biologists, which is both time - consuming and error - prone. The method proposed in the paper aims to fully automate this process without human intervention. 2. **Improving Processing Speed and Accuracy**: Existing automated solutions can reduce manual operations, but often require hundreds of hours of computing time, and have limited accuracy and coverage. The method proposed in the paper is hundreds of times faster than existing methods and has higher accuracy. 3. **3D Detection and Pose Estimation**: Decompose the problem of determining molecular structures into three sub - problems: - Amino Acid Detection: Detect the positions of amino acids in the density volume. - Atomic Coordinate Assignment: Determine the atomic coordinates of each amino acid. - Backbone Threading: Resolve the order of amino acids forming each protein chain. ### Method Overview 1. **3D Detection Network**: Use a 3D Convolutional Neural Network (CNN) for amino acid detection, and introduce the Aspect - Ratio Preserved RoI (APRoI) layer to preserve the aspect ratios of different amino acids, thereby improving detection performance. 2. **Pose - Estimation Network**: Use a 3D Stacked Hourglass Network to regress the 3D coordinates of atoms in each amino acid. 3. **Sequence - guided MCTS Algorithm**: Search among candidate amino acids and connect them to form a complete molecular structure through the MCTS algorithm, and use the Peptide Bond Recognition Network (PBNet) for tree pruning to improve search efficiency. ### Dataset The authors construct a large - scale A2 dataset, which contains 250,000 amino acid objects, distributed in 1,713 protein chains from 218 structures. This dataset is the first large - scale, richly - annotated dataset for automatic molecular structure determination. ### Experimental Results 1. **Detection Performance**: On the A2 dataset, the proposed A2 - Net achieves 0.891 in the mAP (mean Average Precision) metric, significantly outperforming other 3D object detection methods. 2. **Backbone Threading Performance**: Compared with the traditional Depth - First Search (DFS) method, the MCTS + PBNet method performs better in terms of coverage and root - mean - square deviation (RMSD), reaching 0.91 and 2.0 respectively. 3. **Comparison with Rosetta - denovo**: In actual protein structure modeling tasks, the method of A2 - Net significantly outperforms Rosetta - denovo in terms of coverage and processing time. ### Summary This paper proposes a fully - automatic deep - learning - based molecular structure modeling method. Through 3D detection, pose - estimation, and sequence - guided MCTS algorithms, it significantly improves processing speed and accuracy and reduces the dependence on human intervention.

A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes

Protein complex structure modeling by cross-modal alignment between cryo-EM maps and protein sequences

CR-I-TASSER: assemble protein structures from cryo-EM density maps using deep convolutional neural networks

De Novo Atomic Protein Structure Modeling for Cryo-EM Density Maps Using 3D Transformer and Hidden Markov Model

Recognizing amino acid sidechains in a medium resolution cryo-electron density map

A new method on reconstructing protein structure from NOESY distances

A Graph Neural Network Approach to Automated Model Building in Cryo-EM Maps

Cryo2StructData: A Large Labeled Cryo-EM Density Map Dataset for AI-based Modeling of Protein Structures

Building molecular model series from heterogeneous CryoEM structures using Gaussian mixture models and deep neural networks

Accurate Model Annotation of A Near-Atomic Resolution Cryo-Em Map

SEGEM: a Fast and Accurate Automated Protein Backbone Structure Modeling Method for Cryo-EM

A New Protocol for Atomic-Level Protein Structure Modeling and Refinement Using Low-to-Medium Resolution Cryo-EM Density Maps

Solving the α-helix correspondence problem at medium-resolution Cryo-EM maps through modeling and 3D matching

CryoChains: Heterogeneous Reconstruction of Molecular Assembly of Semi-flexible Chains from Cryo-EM Images

Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps

Fast and Automated Protein-DNA/RNA Macromolecular Complex Modeling from Cryo-EM Maps

Accurate flexible refinement for atomic-level protein structure using cryo-EM density maps and deep learning

Automated model building and protein identification in cryo-EM maps

Automated atomic modeling in cryo-EM maps

De Novo Protein Structure Determination from Near-Atomic-resolution Cryo-Em Maps.